web developer & system programmer

coder . cl

ramblings and thoughts on programming...


django and amazon s3

published: 19-01-2012 / updated: 19-01-2012
posted in: development, programming, python, tips
by Daniel Molina Wegener

Amazon S3 is a well known web based storage system provided as SaaS service provided by Amazon Web Services. On Django you can integrate that service using the storage interface called Django Storages, but you must have some considerations using that SaaS storage interface. Mainly regarding the Date header sent to the service on each read, write and similar operations, where you must send an updated header with the proper Time Zone and format.

Once you have configured and working the django-storages package, you should fill the Date header or the x-amz-date header each time that you do a request to S3. So, the only way to do that is not to leave the module sending the date automatically, instead you should write the settings variable AWS_HEADERS.


import pytz
from datetime import datetime
from django.conf import settings

def get_aws_date(self):
    """
    Returns the server date formatted and using the server
    time zone localized to be used as Date header with
    Amazon S3.
    """
    stz = pytz.timezone(settings.TIME_ZONE)
    dtm = stz.localize(datetime.now()).strftime("%a, %d %b %Y %H:%M:%S %z")
    settings.AWS_HEADERS['Date'] = dtm
    settings.AWS_HEADERS['x-amz-date'] = dtm
    return dtm

Also, you must consider this requirement to work with S3 on all your requests. Since django-storages checks if the AWS_HEADERS has the Date or x-amz-date headers set, you must set that header each time that you make a request to S3, so you cannot use a batch read() from the storage or a batch write() to the storage, because it will use the previously sent Date header and it will fail, because it is considered inconsistent by the S3 authentication mechanism. If we use S3 as default storage, the example below will fail, because it will send file chunks with the same Date header, because the header was set previously.


from django.core.files.storage import default_storage

if default_storage.exist('test-large-file.mp3'):
    mp3file = open('test-large-file.mp3')
    s3file = default_storage.open('test-large-file.mp3')
    mp3file.write(s3file.read())
    mp3file.close()
    s3file.close()

So, you need to use small chunks to read from the storage, as the example below.


from django.core.files.storage import default_storage

if default_storage.exist('test-large-file.mp3'):
    get_aws_date()
    mp3file = open('test-large-file.mp3')
    s3file = default_storage.open('test-large-file.mp3')
    buff = s3file.read(settings.AWS_CHUNK_SIZE)
    while buff:
        mp3file.write(buff)
        try:
            buff = s3file.read(settings.AWS_CHUNK_SIZE)
        except Exception, exc:
            buff = None
    mp3file.close()
    s3file.close()

But also you must consider few issues about this. Since AWS_HEADERS is a global variable, writing to that variable will slow down your code because it lacks time on GIL usage, also it will lack your threaded application if it written twice, even if you are using locks and Django can handle parallel writes to that variable. So, be careful reading and writing large files from S3, and take a look on how the Date header is sent on each request.


def _add_aws_auth_header(self, headers, method,
                         bucket, key, query_args):
    if not headers.has_key('Date'):
        headers['Date'] = time.strftime("%a, %d %b %Y %X GMT",
                                        time.gmtime())

    c_string = canonical_string(method, bucket, key,
                                query_args, headers)
    headers['Authorization'] = 
        "AWS %s:%s" % (self.aws_access_key_id,
                       encode(self.aws_secret_access_key, c_string))

Where the %X specifier for localized machines with an environment variable LC_ALL different from C will throw the wrong date format disallowing your application to work with S3. This is a well known bug reported on this link. So, the right implementation, should be as follows.


def _add_aws_auth_header(self, headers, method,
                         bucket, key, query_args):
    if not 'Date' in headers:
        stz = pytz.timezone(settings.TIME_ZONE)
        dtm = stz.localize(datetime.now()).strftime("%a, %d %b %Y %H:%M:%S %z")
        headers['Date'] = dtm

    c_string = canonical_string(method, bucket, key,
                                query_args, headers)
    headers['Authorization'] = 
        "AWS %s:%s" % (self.aws_access_key_id,
                       encode(self.aws_secret_access_key, c_string))

Good luck using S3.


6 comments to “django and amazon s3”

  1. I wish I could get this working properly. When django requests the image I have stored on S3 it will never get a 304, but when I copy the link to the image and paste it in a new window it will first 200, then any requests after will 304… Frustrated. Any tips?

    I have set the AWS_HEADER variable in my settings like specified in the django-storage docs. Are you suggesting that everything you mentioned above must be done in order to get a 304?

  2. Check that S3 is not using versioning if you are replacing files. Also, there is a bug as it is linked on the post, please check the issue #56 of the django-storages and follow my patch, rather than using those example lines on the issue comments.

    If you do not want to use the localized date, you should use a GMT version that works, as follows:

        def _add_aws_auth_header(self, headers, method, bucket, key, query_args):
            if not headers.has_key('Date'):
                headers['Date'] = time.strftime("%a, %d %b %Y %H:%M:%S +0000", time.gmtime())
    
            c_string = canonical_string(method, bucket, key, query_args, headers)
            headers['Authorization'] = 
                "AWS %s:%s" % (self.aws_access_key_id, encode(self.aws_secret_access_key, c_string))
    
  3. I’m confused, what is actually referencing the S3.py file that would make these changes make a difference? These changes are to be made on S3.py, right?

  4. maybe i should mention i am using.. DEFAULT_FILE_STORAGE = ‘storages.backends.s3boto.S3BotoStorage’

  5. Well, if you are using the python-boto library, this patch will not work for you. I have not tested boto with django-storages, sorry. Try the other backend library for django-storages. With boto you need to configure the /etc/boto.cfg file.

  6. [...] EDIT 1: I read this article and couldn’t translate it very well.. http://coder.cl/2012/01/django-and-amazon-s3/comment-page-1/ [...]

post a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>