NOSQL Databases & DynamoDB¶

DynamoDB¶

Not enabled by default (Enabled on a table by table basis)
Continuous record of changes allows to replay to any point (35 day window) - Restore within 1 second granularity

Query is the most efficient operation in a DyanmoDB but, it can only work on 1 PK value at a time & optionally a single or a range of sort key values
Indexes are alternative views on table data
Can choose which attributes from base table are projected for both indexes

LSI & GSI Consideration

Streams

Trigger

Global tables provides multi-master cross-region replication
Tables are created in multiple regions and added to the same global table
Last writer wins is used for conflict resolution (most recent writer)
Reads and Writes can occur to any region
Generally sub-second replication between regions
Strongly consistent reads ONLY in the samge region as writes (otherwise eventually consistent)

Primary Node (Write) and Replicas (Read)
Nodes are HA, Primary failure = election
In-memory cache - Scaling. Much master reads, reduced costs
Scale up & Scale Out
Supports write-through (commit DynamoDB & write to cache)
While DynamoDB is public AWS service, DAX is deployed within a VPC
Reduce response time of reads operation
Write heavy application do not benefit from DAX (Read heavy, with millisecond latency of read requirement do)

Timestamp for automatic DELETE of items
When TTL is enabled on a table a specific attributeis selected for TTL
A per partition processs periodically runs, checking the current time (in seconds since epoch) to the value in the TTL attribute
ITEMS where the TTL attribute is older than the current time are set to expired
Another per-partition background process scans for expired items and removes them from tables and indexes and a delete is added to streams if enabled.
DELETE operatins caused by TTL are background system processes and don't impact table performance and they aren't chargeable

Serverless Interactive Queries Service
Ad-hoc queries on data - pay only for data consumed
Schema-on-read - table like translation
Original data never changed - remains on S3
Schema translates data => relational-like when read
Output can be sent to other AWS services
Tables are defined in advance in a data catalog and data is projected through when read. It allows SQL like queries on data without transforming source data
Queries where loading/transformation isn't desired
Occasional / Ad-hoc queries on data in S3
Serverless querying scenarios - cost conscious
Querying AWS Logs - VPC Flow Logs, CloudTrail, ELB Logs, cost reports etc
AWS Glue Data Catalog & Web Server Logs
Athena Federated Query can query other data source (Can query non S3 data sources)

Petabyte scale Data Warehouse
OLAP (column based) not OLTP (row/transactions)
Pay as you use. Similar to RDS
Direct Query S3 using Redshift Spectrum
Direct Query other DB using Federated Query
Server based (not serverless)
One-AZ in a VPC - not HA
Leader Node - Query input, planning & aggregation
Compute Node - performing queries of data
VPC security, IAM Permissions, KMS at rest encryption, CW monitoring
Redshift Enhanced VPC Routing - by default uses public routes for traffic, but when Enhanced VPC Routing is enabled, traffic is routed based on VPC networking (SG, NACL, VPC Gateways)

Automatic incremental backups to S3 occur every ~8 hours or 5 GB of data & by default have 1-day retention (configurable up to 365 days)
Manual snapshots performed manually, stored in S3 and do not expire unless deleted manually
Redshift backups into S3 protects against AZ failures
Restoring from snapshots creates a brand new cluster
Redshift can be configured to copy snapshots to another AWS region for DR - with a seperate configurable retention period