Reliability & Trusted Advisor — Strict Mode: AWS Cloud Practitioner

Reliability & Availability Concepts

Reliability

Reliability is the probability that an entire system will function as intended for a specified period. It's measured using MTBF (Mean Time Between Failures) = MTTF + MTTR.

MTTF: Mean Time To Failure — how long the system runs before failing.
MTTR: Mean Time To Repair — how long it takes to diagnose and fix the failure.

Availability

Availability = normal operation time / total time. It's expressed as a percentage of uptime over a period (commonly 1 year). The common shorthand is "number of 9s":

Availability	Max Downtime Per Year	Example
99% (two 9s)	~3.65 days	Internal tools
99.9% (three 9s)	~8.76 hours	Business applications
99.99% (four 9s)	~52.56 minutes	Enterprise SaaS
99.999% (five 9s)	~5.26 minutes	Mission-critical systems

High Availability

A highly available system can withstand degradation while remaining available. Downtime is minimized, and minimal human intervention is needed. Services are restored rapidly, often in less than 1 minute.

Three Factors That Influence Availability

Factor	Description
Fault Tolerance	Built-in redundancy of components. System remains operational even if some components fail. Relies on specialized hardware for instant failover. Does NOT address software failures (the most common cause of downtime).
Scalability	Ability to accommodate increases in capacity without changing design. Contributes to availability but doesn't guarantee it.
Recoverability	Policies and procedures related to restoring service after a catastrophic event. Ability to restore quickly with no data loss.

Cost tradeoff: Improving availability usually increases cost. Balance the cost of improvement with the benefit to users. Decide whether "always reachable" or "servicing requests within acceptable performance" is the goal.

AWS Trusted Advisor

An online tool that provides real-time guidance to help you provision resources following AWS best practices. It examines your entire AWS environment and gives recommendations in five categories:

Category	What It Checks
Cost Optimization	Unused/idle resources; opportunities to commit to reserved capacity; potential monthly savings
Performance	Service limits; provisioned throughput utilization; overutilized instances
Security	IAM settings (MFA on root, password policy); security group rules with unrestricted access; S3 bucket permissions; enabling AWS security features
Fault Tolerance	Auto Scaling configuration; health checks; Multi-AZ deployments; backup capabilities (EBS snapshots, S3 bucket logging)
Service Limits	Usage exceeding 80% of the service limit (snapshot-based; changes can take up to 24 hours to reflect)

Trusted Advisor Access by Support Plan

Plan	Checks Available
Basic & Developer	6 core checks (security and service limits)
Business & Enterprise	All checks (full suite across all 5 categories)

Exam Tip: "Which AWS service provides security/performance/cost/fault tolerance optimization recommendations?" → AWS Trusted Advisor. "Provide" + "recommendations" = Trusted Advisor. It does NOT make changes — it only recommends.