Roman Imankulov

Roman Imankulov

Full-stack Python web developer from Porto

search results (esc to close)
27 May 2022

DynamoDB

How it works

  • Data is stored in tables with items like rows.
  • Rows in tables are uniquely identified by their primary key, consisting of a partition key and an optional sort key.
  • The schema is not defined for the rest of the attributes. Different items can have different attributes or different types of the same attribute.
  • Sharding is automated by the hash of the partition key.

Data consistency

  • Eventual consistency is the default behavior. Strong consistency is an option. Strongly consistent reads are more expensive (2x), and sometimes they may not be available.
  • Advice: try to design around eventual consistency.

Throughput

  • Measured in RCU (read capacity units) and WCU (write capacity units)
  • 1 RCU - one strongly or two eventually consistent reads per second (at most 4kb)
  • 1 WCU - one standard write per second (at most 1kb)
  • Tables are created with the minimal provisioned capacity of 1 RCU and 1 WCU, which costs around $0.59 per month. Can scale up to 10 RCU or 10 WCU.
  • Throttling on exceeding.

Querying data

  • Scan. Sort of like SCAN in Redis. Iterates over all the records in the table and returns only the ones matching the criteria. Avoid this.
  • Query. Specify the partition and the sort key expression. It’s cheaper and faster.

Local secondary indexes (LSI)

LSI shares partitions with the data and has the same partition key but a different key. Of limited use.

Global secondary indexes (GSI)

  • Think of them as separate tables, replicated from the main table.
  • Can have different keys.
  • Can be created and deleted as needed.

DynamoDB streams

Keeps the stream of writes up to 24 hours.

Expiring items

  • Set TTL. The item can be removed within 48 hours: ref.
  • You can store expired items in cold storage (S3) via Kinesis and Lambda.

Design considerations

  • Partition keys should have high cardinality
  • Avoid hot partitions and hot keys.
  • If your data is historical, use different tables for high and low loads. Different tables can have different provisioned capacity settings. Ancient data can be archived to S3.
  • If hot keys are inevitable, consider using write sharding and scatter/gather pattern. Example.
Roman Imankulov

Hey, I am Roman, and you can hire me.

I am a full-stack Python web developer who loves helping startups and small teams turn their ideas into products.

More about me and my skills