Post Cover Photo

6 ways to make working with DynamoDB an awesome experience

I've had a pleasant experience working with DynamoDB. Amazon did fantastic work in the past few years improving their NoSQL offering. It's now truly powerful and versatile.

Thanks to our client, ClearCare, for enabling me to work with DynamoDB and share my lessons. Together, we created an open source library called cc_dynamodb3 to help others make the most of their python integration.

Background

As part of ClearCare's broader move to SOA (and the now more popular term, microservices), we had more freedom to choose the right tools for the job.

The need to launch a new feature created an opportunity to experiment with a separate, simpler datastore. Since they were already using so much of the AWS stack, DynamoDB was a natural choice.

We set up the new feature as a separate service, with its own servers, separate domain and database, interacting with the main product via APIs.

End result

Thanks largely to DynamoDB's scalability, plus a solid API integration (mostly via JSON), we were able to launch the feature on a very tight deadline and have it scale immediately.

Throughout 6+ months of enhancements, we rarely had fires to put out, and almost no downtime. Remarkable, considering usage grew ~100x.

We learned a lot along the way, and I'm happy to share some of those lessons with you right now!

Alright, help me make the most of working with DyanmoDB and python!

advice

The following tips may help you work with Amazon Web Services' DynamoDB and python (including Django or Flask).

1. Use boto3

As of June, 2015, Amazon recommends boto3 moving forward. boto2 is being deprecated and boto3 offers official support and a much cleaner and more pythonic API to work with AWS!

boto3

Upgrading from boto2 to boto3 is fairly easy, although I strongly suggest you write tests for the affected code. Unit tests are essential, and you should have at least one for each low-level piece of code that used boto2 directly.

I also suggest at least one or two higher level tests, and manually try out a code path that's core to the business (to avoid breaking key user paths and waking up the Ops department ;).

Check out how we implemented the boto3 connection.

2. Validate your data

validation

DynamoDB supports several data types natively that your language will probably handle a bit differently.

In our case, it's nice to convert our data to native python objects where appropriate:

Fortunately, we have that work all done and tested as part of cc_dynamodb3. Check out models.py here and here especially.

We used schematics to create a light ORM.

3. Consider a UUID HashKey for your primary key

A table's primary key cannot be changed once created. There are lots of resources out there to help you design your tables, and I suggest you research your use cases ahead of time.

In our experience, the safest choice is a string UUID Hash Key. It gives you 100% flexibility over how you're going to uniquely identify your data. You can always add, modify or remove GSIs to improve performance for specific query operations.

Here are some easily avoidable mistakes:

  • Don't include a RangeKey in an index unless you actually have multiple records with the same HashKey. It makes it a pain to GetItem directly, since you'll need both the Hash and Range keys to find it. (PS. range key = sort key)
  • Choose the primary key carefully. You cannot change the primary key once the table is created, and then you'll have to deal with migrating your data to a new table
  • When you update an item via PutItem, DynamoDB will create a new entry if the primary key has changed. The old item will remain unchanged!

4. Consider a composite RangeKey for extra flexibility

extra flexibility

You can use a begins_with filter on a RangeKey, but you cannot use it on a HashKey.

For example, say you have a book library. If you first and foremost care to keep your books by publisher, that could be your HashKey. Then, you may sort the books by year of publication, and uniquely identify them by their ISBN. So you can have:

  • HashKey: publisher ID (or publisher name), e.g. HarperBusiness
  • RangeKey: year + ISBN, e.g. 1995-9512512

You can find all HarperBusiness books published in 1995 via a single query using begins_with, what great performance!

5. Scans and queries do not return all data in the table.

Here is an example of how we do this in the light ORM from cc_dynamodb3:

class DynamoDBModel(Model):
  # moar stuff here...

  @classmethod
  def all(cls):
      response = cls.table().scan()
      # DynamoDB scan only returns up to 1MB of data, so we need to keep scanning.
      while True:
          metadata = response.get('ResponseMetadata', {})
          for row in response['Items']:
              yield cls.from_row(row, metadata)
          if response.get('LastEvaluatedKey'):
              response = cls.table().scan(
                  ExclusiveStartKey=response['LastEvaluatedKey'],
              )
          else:
              break

Caveat: Item.all() may do multiple queries to DynamoDB, and thus has a hidden cost and time. Please note the use of yield to lazy-evaluate the results. This gives you an opportunity to retrieve data only until you need to, and avoids scanning the whole table in one go.

In practice, you don't want to perform table scans or large queries on the main sever thread anyway.

6. Include created and updated columns in each table

big data

These days being the days of big data and analytics, I suggest always having a created and an updated column.

In our light ORM, we can do this with a BaseModel:

from schematics import types as fields
from cc_dynamodb3.models import DynamoDBModel


class BaseModel(DynamoDBModel):
    created = fields.DateTimeType(default=DynamoDBModel.utcnow)
    updated = fields.DateTimeType(default=DynamoDBModel.utcnow)

    def save(self, overwrite=False):
        self.updated = DynamoDBModel.utcnow()
        return super(DynamoDBModel, self).save(overwrite=overwrite)

Then, in your code:

from myproject.db import BaseModel


class Book(BaseModel):
    publisher = fields.StringType(required=True)

This makes sure all your models inheriting from BaseModel will have the two columns automatically populated. Ta-da!

If you'd enjoy working on these types of projects while having a chance to help make aging better, ClearCare is hiring full-time engineers! Check out their careers section.

Comments