Lesson 36

NoSQL Database and Data Warehouse

Amazon DynamoDB and Amazon Redshift

1. Relational vs. Non-Relational Databases

FeatureRelational (SQL)Non-Relational (NoSQL)
Data storageRows and columnsKey-value, document, graph
SchemaFixedDynamic / flexible
QueryingUses SQLFocuses on collections of documents
ScalabilityVertical (larger instances)Horizontal (more nodes)

2. Amazon DynamoDB

Amazon DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single-digit-millisecond latency at any scale. It is fully managed, and AWS handles all the underlying data infrastructure.

Core idea: DynamoDB is a serverless NoSQL database that runs exclusively on SSDs. It supports document and key-value store models with unlimited storage and throughput.

Key Characteristics

  • No practical table size limit: Some customers have production tables with billions of items.
  • Flexible schema: Items in the same table can have different attributes. No schema migrations needed.
  • Scalable throughput: Provision read/write capacity manually or enable automatic scaling. DynamoDB monitors load and adjusts automatically.
  • Global Tables: Automatically replicate tables across your choice of AWS Regions.
  • Encryption at rest and item Time-to-Live (TTL).

Core Components

ComponentDescription
TableA collection of data.
ItemA group of attributes uniquely identifiable among all other items.
AttributesFundamental data elements (like columns in a relational database).

Primary Keys

  • Partition key (simple primary key): A single attribute that uniquely identifies an item.
  • Partition key + Sort key (composite key): Two attributes combined to identify items. Useful when frequently querying by a category plus a detail (e.g., author + title).

Query vs. Scan

  • Query: Uses the primary key to efficiently locate items. Takes advantage of partitioning.
  • Scan: Examines every item in the table to find matches on non-key attributes. Less efficient for large tables.

Common Use Cases

Mobile and web applications, gaming, ad tech, IoT applications — especially when you have a large number of clients generating data and making many requests per second.

3. Amazon Redshift

Amazon Redshift is a fast, fully managed petabyte-scale data warehouse. It enables you to run complex analytic queries against structured data using standard SQL and your existing business intelligence (BI) tools.

Core idea: Redshift is for analytics, not transaction processing. Think OLAP (Online Analytical Processing) vs. OLTP (Online Transaction Processing). Redshift = analytics queries on massive datasets.

Architecture

  • Leader node: Manages communications with clients, parses queries, develops execution plans, and compiles code for compute nodes.
  • Compute nodes: Run compiled code and send intermediate results back to the leader node for final aggregation.
  • Redshift Spectrum: Runs queries against exabytes of data directly in Amazon S3 without loading it into Redshift.

Key Features

  • Columnar storage: Data is stored by column instead of row, which dramatically speeds up analytic queries.
  • Massively parallel processing: Distributes data and queries across multiple nodes for high performance. Most results return in seconds.
  • Automatic monitoring and backup: Continuously monitors the cluster and backs up data for easy restore.
  • Built-in encryption: Encryption at rest and in transit.
  • Scalable: Add more nodes with no downtime. Pricing starts at 25 cents per hour.

Use Cases

  • Enterprise data warehouse migration with agility and low upfront cost
  • Big data analytics at a low price point
  • SaaS applications providing analytic capabilities

DynamoDB vs. RDS vs. Redshift

Transactional workload with complex queries and joins?Amazon RDS or Aurora (relational).

Simple key-value lookups at massive scale with single-digit millisecond latency?Amazon DynamoDB (NoSQL).

Analytic queries on petabytes of structured data using BI tools?Amazon Redshift (data warehouse).

4. Quick Quiz

Test Your Understanding

Select one answer per question. You will receive immediate feedback.

1. A mobile gaming company needs a database that provides single-digit millisecond latency at any scale with automatic throughput scaling. Which service is the best fit?
2. A business intelligence team needs to run complex SQL queries against petabytes of structured sales data. They use existing BI tools. Which service should they use?
3. An application stores items that have different attributes per item. New attributes are added over time without schema changes. Which database type supports this?
4. Which Amazon Redshift feature enables running queries directly against data stored in Amazon S3 without loading it into the cluster?
5. How does DynamoDB scale to handle increased read/write throughput?
6. What is the primary architectural difference between Redshift and RDS?
Progress: 0/6 correct (0%). Answer all questions to see the final recommendation.
Primary Source: AWS Academy Module 8: Databases (module-8.pdf).
Ask your teacher: If you confuse when to use DynamoDB versus RDS, or Redshift versus RDS, remember: DynamoDB = NoSQL key-value at scale, RDS = transactional SQL, Redshift = analytics/data warehouse with SQL and BI tools.
Last updated: June, 2026© 2026 Shahriar Ahmed ShovonCredits