Originally published at https://gist.github.com/neo01124/dc31d0b08bd7ac6906d06197e20dc9b6
This must be at least the 5th time I’ve written this kind of code for different projects and decided to make a note of it for good.
This might seem like a very trivial task until you realise that S3 has no concept of folder hierarchy. S3 only has the concept of buckets and keys. Buckets are flat i.e. there are no folders. The whole path (folder1/folder2/folder3/file.txt) is the key for your object. S3 UI presents it like a file browser but there aren’t any folders. Inside a bucket there are only keys. From the S3 docs
The Amazon S3 data model is a flat structure: you create a bucket, and the bucket >stores objects. There is no hierarchy of subbuckets or subfolders; however, you >can infer logical hierarchy using key name prefixes and delimiters as the Amazon >S3 console does.
The challenge in this task is to essentially create the directory structure (folder1/folder2/folder3/) in the key before downloading the actual content of the S3 object.
Option 1 – Shell command
Aws cli will do this for you with a sync operation
aws s3 sync s3://yourbucket /local/path
Enter fullscreen mode Exit fullscreen mode
Option 2 – Python
- Install boto3
- Create IAM user with a similar policy
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:ListBucketMultipartUploads", "s3:ListMultipartUploadParts", "s3:GetObject", "s3:GetBucketLocation", ], "Resource": [ "arn:aws:s3:::your_bucket_name" ] } ] }
Enter fullscreen mode Exit fullscreen mode
- Create a profile in ~/.aws/credentials with access details of this IAM user as explained in the boto documentation
- Code
import boto3, errno, os
def mkdir_p(path):
# mkdir -p functionality from https://stackoverflow.com/a/600612/2448314 try:
os.makedirs(path)
except OSError as exc: # Python >2.5 if exc.errno == errno.EEXIST and os.path.isdir(path):
pass
else:
raise
def get_s3_path_filename(key):
key = str(key)
return key.replace(key.split('/')[-1],""), key.split('/')[-1]
def download_s3_bucket(bucket_name, local_folder, aws_user_with_s3_access):
session = boto3.Session(profile_name=aws_user_with_s3_access)
s3_client = session.resource('s3')
s3_bucket = s3_client.Bucket(bucket_name)
for obj in s3_bucket.objects.all():
s3_path, s3_filename = get_s3_path_filename(obj.key)
local_folder_path = os.path.join(*[os.curdir,local_folder, s3_path])
local_fullpath = os.path.join(*[local_folder_path, s3_filename])
mkdir_p(local_folder_path)
s3_bucket.download_file(obj.key, local_fullpath)
download_s3_bucket(bucket_name = your_bucket_name, local_folder = "/tmp/s3_bucket", aws_user_with_s3_access = profile_name)
Enter fullscreen mode Exit fullscreen mode
I’d make a package, if there is enough interest 🙂
原文链接:Download S3 bucket
暂无评论内容