AWS S3 Batch Operations is a solution to quickly process large quantities of ETL data by invoking a Lambda Function, however you first need to create a manifest file describing all the objects you want to process. I couldn’t find any quick solutions to easily create these manifests online so I put together a solution in Python. You can find the GitHub code here.
Instructions to generate an S3 Manifest CSV
For creating a csv manifest list of all files in an S3 bucket with a certain prefix and suffix.
- Compatible for use with S3 Batch Operations.
- Manfest uses bucket, key schema.
- Manifest is uploaded to a target location on S3.
Executing from the CLI:
$ python generate_manifest.py bucket_name prefix suffix manifest_bucket manifest_key
$ python generate_manifest.py my-bucket path/to/data/2020-09-24/ .xml manifest-bucket 2020-10-26_manifest.csv