NoSQL Database and Data Warehouse
Amazon DynamoDB and Amazon Redshift
1. Relational vs. Non-Relational Databases
| Feature | Relational (SQL) | Non-Relational (NoSQL) |
|---|---|---|
| Data storage | Rows and columns | Key-value, document, graph |
| Schema | Fixed | Dynamic / flexible |
| Querying | Uses SQL | Focuses on collections of documents |
| Scalability | Vertical (larger instances) | Horizontal (more nodes) |
2. Amazon DynamoDB
Amazon DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single-digit-millisecond latency at any scale. It is fully managed, and AWS handles all the underlying data infrastructure.
Key Characteristics
- No practical table size limit: Some customers have production tables with billions of items.
- Flexible schema: Items in the same table can have different attributes. No schema migrations needed.
- Scalable throughput: Provision read/write capacity manually or enable automatic scaling. DynamoDB monitors load and adjusts automatically.
- Global Tables: Automatically replicate tables across your choice of AWS Regions.
- Encryption at rest and item Time-to-Live (TTL).
Core Components
| Component | Description |
|---|---|
| Table | A collection of data. |
| Item | A group of attributes uniquely identifiable among all other items. |
| Attributes | Fundamental data elements (like columns in a relational database). |
Primary Keys
- Partition key (simple primary key): A single attribute that uniquely identifies an item.
- Partition key + Sort key (composite key): Two attributes combined to identify items. Useful when frequently querying by a category plus a detail (e.g., author + title).
Query vs. Scan
- Query: Uses the primary key to efficiently locate items. Takes advantage of partitioning.
- Scan: Examines every item in the table to find matches on non-key attributes. Less efficient for large tables.
Common Use Cases
Mobile and web applications, gaming, ad tech, IoT applications — especially when you have a large number of clients generating data and making many requests per second.
3. Amazon Redshift
Amazon Redshift is a fast, fully managed petabyte-scale data warehouse. It enables you to run complex analytic queries against structured data using standard SQL and your existing business intelligence (BI) tools.
Architecture
- Leader node: Manages communications with clients, parses queries, develops execution plans, and compiles code for compute nodes.
- Compute nodes: Run compiled code and send intermediate results back to the leader node for final aggregation.
- Redshift Spectrum: Runs queries against exabytes of data directly in Amazon S3 without loading it into Redshift.
Key Features
- Columnar storage: Data is stored by column instead of row, which dramatically speeds up analytic queries.
- Massively parallel processing: Distributes data and queries across multiple nodes for high performance. Most results return in seconds.
- Automatic monitoring and backup: Continuously monitors the cluster and backs up data for easy restore.
- Built-in encryption: Encryption at rest and in transit.
- Scalable: Add more nodes with no downtime. Pricing starts at 25 cents per hour.
Use Cases
- Enterprise data warehouse migration with agility and low upfront cost
- Big data analytics at a low price point
- SaaS applications providing analytic capabilities
DynamoDB vs. RDS vs. Redshift
Transactional workload with complex queries and joins? → Amazon RDS or Aurora (relational).
Simple key-value lookups at massive scale with single-digit millisecond latency? → Amazon DynamoDB (NoSQL).
Analytic queries on petabytes of structured data using BI tools? → Amazon Redshift (data warehouse).
4. Quick Quiz
Test Your Understanding
Select one answer per question. You will receive immediate feedback.