Downloading multiple files from Amazon S3

Situation

I have hundreds (and even thousands) of small (~50kb) files on Amazon S3 separated into buckets per day.

Problem

I need to download through my Java application delivering to the front-end of all files of certain period. My Cloud Machine is limited in memory and disk resources(it has 2GB of RAM and 5GB of disk).

Solution 1

Download the files one by one and pass them to the front-end? Somewhat inefficient solution, since it is about thousands of small files.

Solution 2

Download one by one the files and compress in zip (considering the limits of the machine, breaking the zip into parts if it is the case) and upload this zip to Amazon S3, delivering to the front only the zip link.

Question

Is there another solution that someone has already used, some native AWS feature or some more efficient idea to solve this problem?

 2
Author: Comunidade, 2018-06-06

1 answers

If I understood correctly the problem would be performance, some things that I believe can help:

1-Use a function to get the zip ready: https://docs.aws.amazon.com/lambda/latest/dg/with-s3.html

2-deliver pro client via CloudFront (CDN): https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/MigrateS3ToCloudFront.html

3-deliver via BitTorrent: https://docs.aws.amazon.com/AmazonS3/latest/dev/S3Torrent.html

4 - Use TransferManager class to download in parallel: https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/examples-s3-transfermanager.html

5-Avoid using such small files, maybe aggregate in larger batches with lambda or glue.

Delivering direct through S3 / CloudFront is better in terms of cost, performance and security.

 1
Author: Julio Faerman, 2018-06-07 08:43:18