Download S3 bucket

Originally published at https://gist.github.com/neo01124/dc31d0b08bd7ac6906d06197e20dc9b6

This must be at least the 5th time I’ve written this kind of code for different projects and decided to make a note of it for good.

This might seem like a very trivial task until you realise that S3 has no concept of folder hierarchy. S3 only has the concept of buckets and keys. Buckets are flat i.e. there are no folders. The whole path (folder1/folder2/folder3/file.txt) is the key for your object. S3 UI presents it like a file browser but there aren’t any folders. Inside a bucket there are only keys. From the S3 docs

The Amazon S3 data model is a flat structure: you create a bucket, and the bucket >stores objects. There is no hierarchy of subbuckets or subfolders; however, you >can infer logical hierarchy using key name prefixes and delimiters as the Amazon >S3 console does.

The challenge in this task is to essentially create the directory structure (folder1/folder2/folder3/) in the key before downloading the actual content of the S3 object.

Option 1 – Shell command

Aws cli will do this for you with a sync operation

aws s3 sync s3://yourbucket /local/path

Enter fullscreen mode Exit fullscreen mode

Option 2 – Python

  • Install boto3
  • Create IAM user with a similar policy
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:ListBucketMultipartUploads", "s3:ListMultipartUploadParts", "s3:GetObject", "s3:GetBucketLocation", ], "Resource": [ "arn:aws:s3:::your_bucket_name" ] } ] } 

Enter fullscreen mode Exit fullscreen mode

  • Create a profile in ~/.aws/credentials with access details of this IAM user as explained in the boto documentation
  • Code
import boto3, errno, os

def mkdir_p(path):
    # mkdir -p functionality from https://stackoverflow.com/a/600612/2448314     try:
        os.makedirs(path)
    except OSError as exc:  # Python >2.5         if exc.errno == errno.EEXIST and os.path.isdir(path):
            pass
        else:
            raise

def get_s3_path_filename(key):
    key = str(key)
    return key.replace(key.split('/')[-1],""),  key.split('/')[-1]

def download_s3_bucket(bucket_name, local_folder, aws_user_with_s3_access):
    session = boto3.Session(profile_name=aws_user_with_s3_access)
    s3_client = session.resource('s3')
    s3_bucket = s3_client.Bucket(bucket_name)
    for obj in s3_bucket.objects.all():
        s3_path, s3_filename = get_s3_path_filename(obj.key)
        local_folder_path = os.path.join(*[os.curdir,local_folder, s3_path])
        local_fullpath = os.path.join(*[local_folder_path, s3_filename])
        mkdir_p(local_folder_path)
        s3_bucket.download_file(obj.key, local_fullpath)

download_s3_bucket(bucket_name = your_bucket_name, local_folder = "/tmp/s3_bucket", aws_user_with_s3_access = profile_name)

Enter fullscreen mode Exit fullscreen mode

I’d make a package, if there is enough interest 🙂

原文链接:Download S3 bucket

© 版权声明
THE END
喜欢就支持一下吧
点赞9 分享
评论 抢沙发

请登录后发表评论

    暂无评论内容