Amazon Aurora Deep Dive Series: From Monolith to Modular - Inside Amazon Aurora’s Cloud-Native Database Architecture ~ Technology blog by Rathish Kumar

An Aurora Deep Dive Series by Rathish Kumar B - Part 2

Amazon Aurora reimagines the database as a set of decoupled, distributed services—each built to scale, fail, and recover independently.

In our previous article we discussed why monolithic databases hit scalability and availability limits as workloads grow. Traditional RDBMS engines bundle query processing, transaction management, caching and storage into one tightly-coupled system. In such a monolithic design, every SQL write passes through a single process that parses the query, locks data, updates in-memory buffers, logs changes, and flushes to disk. By definition, “monolithic” means all functionally distinguishable components (parsing, processing, logging, etc.) are interwoven rather than separate. This coupling creates bottlenecks: for example, all sessions share one buffer pool and one write-ahead log (WAL) stream on the same machine. The rest of this article examines the traditional SQL transaction path and its tradeoffs, and then shows how Aurora breaks these layers apart into cloud-native services for greater throughput and resilience.

Transaction Flow in Traditional Database Systems

In a classic relational database, a write transaction goes through several stacked layers. First, the SQL processor parses and optimizes the query into an execution plan. Next, the transaction manager begins the transaction, acquires locks, and enforces ACID guarantees. The execution then touches the buffer pool (an in-memory cache): it reads needed pages from disk into RAM or updates cached pages. Every change is also recorded in the write-ahead log (WAL), which stores before-and-after images for crash recovery. Periodically the system performs a checkpoint: flushing dirty (modified) pages from the buffer pool to durable disk storage. Finally, once all log records for a transaction are safely on disk, the transaction commits and control returns to the application.

The following diagram illustrates this entire process, tracing the path of a single write transaction through the tightly-coupled layers of a monolithic database engine.

Transaction Flow in Traditional Database Systems

To see how these layers work in practice, let's trace the lifecycle of a single SQL statement using a concrete example: updating an account balance.

UPDATE accounts SET balance = balance - 5000 WHERE account_id = 101;

SQL Parsing & Planning

The SQL processor first checks for valid syntax.
Then, it generates an execution plan. If account_id is indexed, the optimizer chooses an index scan over a full table scan.
This step is lightweight, but the quality of the execution plan has a huge impact on performance.

With the optimal plan to locate account #101 decided, the database is ready to execute it safely within the confines of a transaction.

Transaction Manager

A transaction ID is assigned.
Locks are acquired (e.g., exclusive row lock on account #101).
The system enforces ACID guarantees: Atomicity, Consistency, Isolation, Durability.

The transaction layer coordinates when a transaction begins (BEGIN or implicit start) and ends (COMMIT or ROLLBACK). Locking ensures no other transaction can modify this row until the current one completes, preventing dirty reads and write conflicts.

With the necessary locks in place, the executor can now access the data, which happens in the high-speed buffer pool.

Buffer Pool / Caching

The database works with data in fixed-size memory blocks called pages.
When the database needs a page, it first checks the buffer pool.

If the page is there (a cache hit), it's read instantly from fast RAM.
If the page isn't there (a cache miss), the system must perform a much slower read from the disk to fetch it into the cache.

The page for account #101 is loaded into the pool, and the UPDATE happens in-memory, marking the page as "dirty."

The Buffer Pool is a large, shared cache in RAM designed to minimize slow disk I/O, which is the single biggest performance bottleneck. All reads and writes target this cache first.

By manipulating pages in this high-speed staging area, the database can perform operations thousands of times faster. But how are these fast, in-memory changes made safe in case of a crash? This is where the strict Write-Ahead Logging protocol comes into play. The diagram below illustrates the Buffer Pool in action, showing how the database pages are fetched and from disk and updates and stores it back.

Database Buffer Pool. Source CMU Database Systems

Redo/Undo Logging

The database follows a strict Write-Ahead Logging (WAL) protocol, meaning changes are logged before they are written to the page.
A log record describing the change (e.g., before/after values for account #101) is created and flushed to the permanent log file on disk.
A transaction is only considered committed when its log records are secure.

This log allows the system to redo committed changes or undo incomplete ones after a crash. The diagram below illustrates Write Ahead Logging example:

Write Ahead Logging. Source CMU Database Systems

While the log guarantees the durability of the change, the dirty data pages themselves are still only in RAM. Getting them to disk is the job of the checkpointing process.

Checkpointing

Periodically, the DBMS performs a checkpoint to flush all dirty pages from the buffer pool to the main data files on disk.
This crucial background process synchronizes the in-memory state with durable storage and bounds recovery time; after a crash, the database only needs to replay logs created since the last successful checkpoint.
This operation creates a trade-off, as checkpoints can cause I/O spikes that slow down transactions.

These flushed pages must land on a permanent medium, which brings us to the final layer.

Durable Storage

The final layer is the physical disk (attached or a SAN), where the main data files and log files permanently reside.

In a traditional monolithic architecture, this entire stack—from the SQL parser in memory down to the files on disk—is managed by a single, tightly-coupled database process on one machine.

Each of these components is critical. The buffer pool speeds up reads/writes, the log guarantees durability, and the checkpoint process bounds recovery time. But because they are tightly bundled, heavy activity in one (e.g. flushing pages at checkpoint) can block others, as we discuss next.

Trade-offs and Scalability Challenges

In a monolithic database, each layer introduces scalability trade-offs:

Buffer Pool Contention: A single shared buffer pool improves cache locality but limits concurrency. All writes contend for latches on common pages and I/O bandwidth. Scaling memory beyond one machine is hard: traditional engines cannot share RAM across servers. If the buffer pool is too large, maintenance (like scans or writes) slows down, but if too small it causes more disk I/O and thrashing.
Log Flush Latency: Every commit requires flushing the WAL to disk. This creates a sequential I/O bottleneck. Databases mitigate this with group-commit (batching multiple transactions’ log writes), but spikes in write traffic or a slow disk can still cause queueing delays. A missing or corrupted WAL record can halt recovery entirely. In practice, each log write is an I/O op, so heavy write workloads incur high IOPS cost.
Checkpoint/Flush Bottlenecks: When dirty pages are checkpointed, the DB must write large batches of data pages to disk. This can cause a sudden I/O spike that slows incoming transactions. To avoid overloading the disk, databases throttle writes, but that throttling in turn limits throughput. Large transactions or long-running updates can flood the buffer with dirty pages faster than they can be flushed, causing stalls. Moreover, on crash recovery a traditional DB must replay all logs since the last checkpoint, potentially taking minutes to catch up – further delaying availability.
Single-Machine Storage Limits: Because all data and logs reside on one server’s disks or SAN, the database is constrained by that hardware’s capacity and durability. A single node can usually support only a few terabytes to, at most, a few tens of terabytes of data. Beyond that, storage partitioning or sharding is needed, which complicates the design. Also, all failure modes of that one host (disk failure, full-volume, AZ outage, etc.) risk the database.
Slow Recovery and Failover: In a monolithic design, recovering from a crash means restarting the database process and replaying WAL records (redo/undo) to bring the buffer pool and data files into a consistent state. This can take time proportional to the transaction rate since the last checkpoint. Clients must wait (often minutes) before the database is available again. Similarly, promoting a standby replica (in a classic replica setup) can take tens of seconds or more as it now has to catch up on a full copy of the database. By contrast, Aurora’s architecture (discussed below) avoids most of this delay.

The tight coupling of layers in a traditional DB thus creates hard scalability walls: adding CPU or RAM to one node has diminishing returns, and I/O remains the limiting factor. The entire stack is as fast as its slowest layer and is limited to a single node’s resources.

How Aurora Re-architects the Stack

The design philosophy behind Aurora is best summarized by its creators:

To start addressing the limitations of relational databases, we reconceptualized the stack by decomposing the system into its fundamental building blocks. We recognized that the caching and logging layers were ripe for innovation. We could move these layers into a purpose-built, scale-out, self-healing, multitenant, database-optimized storage service. When we began building the distributed storage system, Amazon Aurora was born. We challenged the conventional ideas of caching and logging in a relational database, reinvented the database I/O layer, and reaped major scalability and resiliency benefits. Amazon Aurora is remarkably scalable and resilient, because it embraces the ideas of offloading redo logging, cell-based architecture, quorums, and fast database repairs. — AllThingsDistributed

Amazon Aurora was built for the cloud with a Service-Oriented Architecture (SOA) at its core, moving away from the limitations of a single, monolithic process. The masterstroke of its design is the decoupling of compute from storage. In this model, each database instance handles the traditional "brain" work—SQL parsing, transaction logic, and managing its own in-memory buffer cache. However, it sheds its biggest I/O responsibility. Instead of flushing large data pages to disk, the compute node only sends the small, efficient redo log records (WAL entries) over the network to a purpose-built storage layer.

This is where the magic happens. This distributed storage service receives the stream of logs and applies those changes to the data pages continuously in the background. This design makes the disruptive, I/O-heavy checkpoint process on the database node completely unnecessary, eliminating a major source of latency and contention. Aurora is a symphony of managed AWS services working together: EC2 for compute, a purpose-built log-and-storage service, DynamoDB for metadata, and S3 for backups.

The following diagram provides a high-level overview of Aurora's decoupled architecture.

Amazon Aurora: Decoupled Compute and Storage Architecture (Adapted from AWS)

Below, we’ll map each traditional database layer to its new home in the Aurora ecosystem.

SQL & Transaction Layer (Compute Nodes)

Solves the problems of: Buffer Pool Contention and Single-Machine Storage Limits.
The standard MySQL/PostgreSQL engine runs on stateless EC2 compute nodes.
These nodes handle all query processing, transaction logic, and caching, but offload permanent page writes.
Allows for adding up to 15 read replicas, each with an independent cache, to scale reads without cross-node contention.

Aurora directly tackles the Buffer Pool Contention and Single-Machine Storage Limits of a monolithic design by separating the "brain" of the database from its storage. The familiar MySQL or PostgreSQL engine runs on standard EC2 instances, which handle all the query processing, transaction management, authentication, and in-memory caching. However, their responsibility ends there. They don't write data pages to disk; they only send a stream of redo log records to the storage layer.

Because the compute nodes are effectively stateless (aside from their cache), you can break the single-machine barrier. You can add numerous read replicas (up to 15) that all point to the same shared storage volume. Each replica has its own independent buffer cache and CPU, eliminating the cross-node contention and cache coherency overhead that plagues traditional clusters. When a new reader is added, its cache starts empty ("cold") and warms up as it serves queries by fetching pages from the shared storage volume. It locates these pages not by talking to the writer, but by consulting a shared metadata service that maps logical data pages to their physical location in the distributed storage layer, allowing for massive and efficient read scaling.

Redo Logging (Distributed Storage)

Solves the problem of: Log Flush Latency and the single-point-of-failure risk of a traditional WAL.
Introduces a fundamental architectural shift: the log is the database. The storage layer uses the log stream as the definitive source of truth.
The compute node's only write I/O is sending log records to a distributed storage service; data pages are never written from the compute node.
Writes are confirmed durable after a fast 4-of-6 quorum acknowledgment across multiple AZs, providing extreme fault tolerance.

To break free from the Log Flush Latency that bottlenecks traditional databases, Aurora reimagines the entire purpose of logging. It operates on a simple but powerful philosophy: the log is the database.

Instead of writing to a single log file on a local disk, the compute node sends its redo log records over the network in parallel to a fleet of storage nodes spread across three Availability Zones. This is its only write I/O. The storage layer then uses this log stream as the source of truth to materialize data pages on demand or in the background.

The true innovation lies in its consensus protocol for durability. Each write is sent to six storage nodes, but the transaction is confirmed as committed once a quorum of any four nodes acknowledge it. This makes the commit process both extremely fast (it only waits for the fastest four responses) and incredibly fault-tolerant. In essence, Aurora transforms logging from a sequential, fragile bottleneck into a parallel, resilient, and high-throughput data stream that serves as the foundation for the entire database.

Crash Recovery (Storage Service)

Solves the problem of: Slow Recovery and Failover.
A direct result of the "log is the database" design is that compute node crash recovery is near-instant.
The traditional, time-consuming WAL replay process on the database instance is completely eliminated.
A "survivable cache," managed in a separate process, allows a restarted node to come back "warm" and immediately performant.

Aurora’s architecture solves the Slow Recovery and Failover problem by making the traditional recovery process obsolete. A classic database, after a crash, must painstakingly replay all log records since the last checkpoint to rebuild its in-memory state, a process that can take many minutes while applications wait.

Aurora sidesteps this entirely. Since the distributed storage layer is the durable source of truth, a crashed or restarted compute instance simply reconnects to the already-consistent storage volume. There is no need for a WAL replay on the compute node itself. This is what enables failover times measured in seconds, not minutes.

Furthermore, the "survivable cache" is managed in a separate process from the database engine. This means that for many events, like an engine crash or a Zero-Downtime Patching, the database process can restart and find its valuable in-memory cache already warm and waiting. While a full host failure would clear the instance's RAM and thus its cache, Aurora's fundamental design ensures that recovery remains exceptionally fast regardless, as it never depends on the state of that local cache to begin with.

Checkpointing (Implicit in Storage)

Solves the problem of: Checkpoint/Flush Bottlenecks.
Because the log is the database, the disruptive checkpoint process is completely eliminated from the compute node.
The storage layer continuously "materializes" new page versions from the log stream in the background.
This granular, continuous process replaces the large, indiscriminate I/O storms of traditional checkpoints.

Since the storage layer now treats the log stream as the source of truth, the entire concept of a traditional, disruptive checkpoint becomes obsolete. This directly solves the Checkpoint/Flush Bottlenellcks that create performance stalls. Instead of the database instance pausing for a massive flush of all dirty pages, the storage nodes continuously materialize new page versions from the incoming log records in the background.

This process is fundamentally more efficient. A classic checkpoint is governed by the length of the entire log chain, forcing a huge, indiscriminate flush. Aurora’s continuous page materialization, however, is granular and driven by the needs of individual pages, completely eliminating I/O storms and leading to smoother, more predictable performance.

Durable Storage (Multi-AZ Shared Volume)

Solves the problem of: Single-Machine Storage Limits and the risk of data loss from a single component or AZ failure.
Aurora replaces local disks with a custom, log-structured, distributed storage volume that is shared by all compute nodes.
Data is automatically replicated 6 ways across 3 Availability Zones (AZs) for extreme durability and availability.
The volume automatically scales in 10GB segments up to 128 TB, eliminating the need for manual storage provisioning.

To address the hard Single-Machine Storage Limits and durability risks of traditional databases, Aurora replaces local disks or SANs with a purpose-built, shared storage volume that is distributed by design. Your database is logically broken down into 10GB segments called Protection Groups. Each of these segments is then replicated six times across three different Availability Zones.

This architecture provides immense durability, easily tolerating the loss of an entire AZ without impacting data availability. It also offers seamless scalability. As your data grows, Aurora automatically adds new segments to the volume, scaling up to 128 TB without you having to provision storage in advance. This multi-AZ, log-structured store not only delivers high throughput and redundancy but also allows read requests to be served from any of the data copies, further distributing the load.

Backups (S3 Offload)

Solves the problem of: Slow and performance-impacting backups.
Continuously and asynchronously streams page snapshots and log journals to Amazon S3.
The backup process is completely decoupled from the compute node, causing zero performance impact on the live database.
Enables fast point-in-time restores and the ability to quickly provision database clones from S3.

Aurora addresses the challenge of slow, performance-impacting backups common in traditional systems. Instead of periodic full backups that create heavy I/O load on the primary instance, Aurora’s storage nodes continuously stream data to Amazon S3 in the background. This process is asynchronous and completely decoupled from the compute nodes, meaning there is zero performance impact on your live database during backups. By leveraging the durability and scalability of S3, Aurora not only provides highly reliable backups but also enables powerful features like fast point-in-time restores and the ability to create new, fully functional database clones directly from the S3 backup data with minimal downtime.

Metadata (DynamoDB)

Solves the problem of: Metadata access becoming a bottleneck and creating a single point of failure.
Cluster metadata (like volume configuration and storage segment maps) is stored in Amazon DynamoDB.
Using DynamoDB provides a fast, highly available, and globally accessible control plane.
This decouples cluster state from any single database instance, ensuring all nodes have a consistent view.

A traditional database stores its own metadata in internal system tables, making access to that critical information dependent on the database's own health. Aurora solves this potential bottleneck by offloading its control plane to a separate, highly available service: Amazon DynamoDB.

All critical cluster metadata—such as the configuration of the storage volume, the map of which data lives on which storage segments, and backup pointers—resides in DynamoDB. This means metadata lookups are consistently fast and don't compete with user queries for resources. More importantly, it decouples the cluster's state from any single compute node. When you add a new instance or perform a failover, all nodes get the latest, consistent map of the cluster by querying DynamoDB, ensuring quick and reliable coordination without a single point of failure.

Failover/Discovery (Route 53 Endpoints)

Solves the problem of: Slow Recovery and Failover by making the process transparent to applications.
Aurora provides stable DNS names (endpoints) for the writer and reader instances, managed by Amazon Route 53.
Applications connect to these endpoints, not to a specific database instance's IP address.
On a failover, Aurora automatically updates the DNS record to point to the newly promoted writer, abstracting the complexity from the client.

To further address the challenge of Slow Recovery and Failover, Aurora abstracts the entire discovery process away from the application. In a traditional setup, an application often connects to a specific server IP. When that server fails, the application must be reconfigured to point to a new primary, a slow and brittle process.

Aurora solves this by using a service-oriented approach with Amazon Route 53. It provides a stable cluster endpoint (a DNS name) that always points to the current writer instance. If a failover occurs, Aurora automatically promotes a replica and, in coordination with Route 53, updates the cluster endpoint's DNS record to resolve to the new writer's IP address. Your application simply needs to handle the connection drop and reconnect to the exact same endpoint name. This DNS-based discovery, managed under the hood by RDS, makes failover fast and transparent, eliminating the need for complex client-side logic or manual reconfiguration.

These architectural changes deliver transformative improvements. By offloading I/O to a specialized storage service that only processes log records, Aurora achieves much higher write throughput with a fraction of the operations. This decoupling of compute from storage means you can add read replicas almost instantly and fail over in seconds, not minutes. The survivable cache dramatically reduces recovery time after an engine crash, and scaling out becomes as simple as adding more compute instances, eliminating the need for complex application-level sharding.

In essence, Aurora successfully dismantles the monolithic stack, reassembling it as a symphony of cloud-native services: compute engines on EC2, a distributed log-structured storage service, durable backups on S3, a metadata control plane on DynamoDB, and intelligent routing via Route 53. It is this modular, service-oriented design that allows Aurora to break through the performance and availability ceilings that limit traditional databases.

What’s Next

We’ve seen how Aurora’s decoupled architecture solves the major bottlenecks of a monolithic database. But as the Aurora architects themselves noted, this new design introduces a different set of challenges. When you move to a distributed system, the bottleneck shifts from slow disk I/O to the network. The performance of a write operation is no longer limited by a single disk, but by the latency of the network and the performance of the slowest responding storage node in the fleet—the "outlier" problem.

How does Aurora ensure that writes are both incredibly fast and highly durable when it has to coordinate across multiple servers and Availability Zones? The answer lies in the heart of its design: the purpose-built, distributed storage engine.

In the next article, we’ll take a deep dive into this storage layer. We will explore how its quorum-based protocol achieves consensus without the high latency of traditional methods, how it handles replication and consistency across AZs, and how its unique log-structured design makes crash recovery near-instantaneous. Stay tuned as we unpack the innovative engineering that powers Aurora's performance and resilience.

References & Further Reading

Amazon Aurora Deep Dive Series: The Scaling Bottleneck - Why Traditional Databases Fail and How Aurora Wins
Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases.
Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes.
Amazon Aurora ascendant: How we designed a cloud-native relational database
Amazon Aurora: Cluster Cache Management
CMU Database Systems: Buffer Pools
AWS re:Invent 2024 - Deep dive into Amazon Aurora and its innovations (DAT405)

(Disclaimer: The views and opinions expressed in this article are my own and do not necessarily reflect the official policy or position of any other agency, organization, employer, or company.)

-------------------------------
If you enjoyed this, let’s connect!
🔗Connect with me on LinkedIn and share ideas!