Technology blog by Rathish Kumar

Amazon Aurora Deep Dive Series: From Monolith to Modular - Inside Amazon Aurora’s Cloud-Native Database Architecture

June 19, 2025 / by Rathish Kumar B

An Aurora Deep Dive Series by Rathish Kumar B - Part 2

Amazon Aurora reimagines the database as a set of decoupled, distributed services—each built to scale, fail, and recover independently.

In our previous article we discussed why monolithic databases hit scalability and availability limits as workloads grow. Traditional RDBMS engines bundle query processing, transaction management, caching and storage into one tightly-coupled system. In such a monolithic design, every SQL write passes through a single process that parses the query, locks data, updates in-memory buffers, logs changes, and flushes to disk. By definition, “monolithic” means all functionally distinguishable components (parsing, processing, logging, etc.) are interwoven rather than separate. This coupling creates bottlenecks: for example, all sessions share one buffer pool and one write-ahead log (WAL) stream on the same machine. The rest of this article examines the traditional SQL transaction path and its tradeoffs, and then shows how Aurora breaks these layers apart into cloud-native services for greater throughput and resilience.

Amazon Aurora Deep Dive Series: The Scaling Bottleneck - Why Traditional Databases Fail and How Aurora Wins

June 12, 2025 / by Rathish Kumar B

An Aurora Deep Dive Series by Rathish Kumar - Part 1

Scaling a database sounds simple—until you're staring down a production outage.

The reality is that for decades, the very design of our databases has been at odds with the demands of modern, high-growth applications.

Most traditional database systems begin with a monolithic architecture. In this model, everything—compute, memory, and storage—is tightly coupled and resides on a single server. This all-in-one approach is straightforward when you're starting small. But as your traffic and data volumes explode, that single server inevitably becomes a bottleneck. The first, most common response is to scale vertically by upgrading to a bigger, more powerful server. However, this strategy quickly runs into hard physical and cost limitations. Moreover, you're left with a critical single point of failure, where one hardware issue can bring your entire application to a halt.

How to perform join operation in BigQuery? Exploring BigQuery Join Operations: Broadcast and Hashing Joins & Nested and Repeated Structures.

October 27, 2023 / by Rathish Kumar B

BigQuery: SQL Joins - Photo by Resource Database on Unsplash

SQL joins are used to combine columns from multiple tables to get desired result set. In a typical Relational model we use normalized tables, each table represents an entity (example: employee, department, etc) and its relationships and when we need to get data from more than one tables, for example employee name and employee department, we use joins to combine employee name column from employee table, department name column from department table based on employee number key column, which is available on both the tables.

How to Choose a Data Serialization/Encoding Format? A Practical Guide for Engineers

September 05, 2023 / by Rathish Kumar B

Data Encoding & Decoding. Image Source: Unsplash

In the world of software, we often work with different types of data like lists, tables, and more. These data structures are designed to be fast and efficient when our computer programs use them. However, sometimes we need to move this data out of our computer's memory, like when we want to save it to a file or send it over the internet. To do this, we have to change the data into a special format made up of 0s and 1s, which is quite different from data structures. This process is what we call encoding or serialization.

Unlock Advanced Data Visualization: The Complete Guide to Installing and Using Apache Superset on Linux

July 10, 2023 / by Rathish Kumar B

Data Visualization - Apache Superset Guide. Image Source: Unsplash

Note: This article provides a comprehensive guide on deploying and using Apache Superset on a Linux server. It covers the installation and configuration process, as well as the benefits and features of Superset. While the primary focus is on Superset, we will also explore the broader concepts of business intelligence, data analytics, and visualization.

GCP Cloud Pub/Sub Replay: Seeking to timestamp & Seeking to snapshots

March 09, 2023 / by Rathish Kumar B

Google Cloud Pub/Sub Replay (Pixabay)

Let's assume, you have data pipeline deployed on Google Cloud Platform, events are published to Cloud Pub/Sub topic from publisher client, and subscribed by a data processing application, which reads data from the Cloud Pub/Sub subscription, process it and write it to BigQuery table.

[Solved] Access is denied. Check credentials and try again: Microsoft Graph - Calendar API

March 02, 2023 / by Rathish Kumar B

Microsoft Graph (Source: microsoft.com)

When sending API request to Microsoft Graph API, it responds with access denied error. You might have followed the documentation and added the correct permission and granted admin consent for the same, but it still produces the same error. Lets check the solution for this issue in this short article.

Streaming Analytics in Google Cloud Platform (GCP) - Building Data Pipeline with Apache Beam

January 18, 2023 / by Rathish Kumar B

Building Apache Beam Data Pipeline (Source: Pixabay)

In introduction article of this series Streaming Analytics in Google Cloud Platform (GCP) - Introduction, we have seen the basics of streaming analytics, its importance and example uses cases, and short introduction about the Google Cloud Services, we will be using to build Streaming Analytics system in Google Cloud Platform.

Streaming Analytics in Google Cloud Platform (GCP) - Setting Up The Environment

January 13, 2023 / by Rathish Kumar B

Streaming Analytics in GCP (Source: Pixabay)

Hello everyone, in the previous article Streaming Analytics in Google Cloud Platform - Introduction, we have covered what is streaming analytics, what services we are going to use and a quick introduction to each service. In this part of the series, we will begin the installation of SDKs, and libraries and set up our environment.

Streaming Analytics in Google Cloud Platform (GCP) - Introduction

January 10, 2023 / by Rathish Kumar B

Streaming Analytics in Google Cloud Platform (image source - pixabay)

From data-to-decision in real-time

Welcome to our new series on building a streaming analytics system in the Google Cloud Platform!. Let's begin with a quick introduction. Streaming analytics is the process of analysing data in real-time as it is received. Streaming analytics enables an organisation to gain insights and make decisions based on the most up-to-date data, in real time. This is crucial for business as it allows organisations to respond to changes and opportunities in a timely manner.

Installing and configuring Docker Engine and Docker Compose on CentOS

June 03, 2021 / by Rathish Kumar B

Installing & Configuring Docker Engine on CentOS

Installing & Configuring Docker Engine & Docker-compose on CentOS. Source: pixabay

I have written this post as a quick reference guide for installing and configuring Docker Engine and Docker compose on CentOS servers. Knowing the basics of Docker containers helps you focus on the end goal of solving problems, than spending your energy on other less important aspects. If you have experimented with multiple packages and applications, you might be knowing that, Docker containers makes it really easy to install and running softwares without worrying about dependencies, scripts and configurations, etc. Also, when we are building and releasing our solutions, it is important to let others consume it in simple process, Docker images helps you achieve that. I will be covering below topics in this article:

Real-time Monitoring and Log Streaming in Google Cloud Platform (GCP)

May 19, 2021 / by Rathish Kumar B

Monitoring Dashboard with charts - credit pixabay

Getting insights into performance, availability and health status of infrastructure and application is very critical for building and managing reliable systems. When we are dealing with clusters of instances, it is becoming very challenging to collect, aggregate and derive actionable insights from data in real time. There are monitoring tools available to address this challenge, both open source and commercial products, we are going to discuss about how to achieve real-time log streaming analytics and monitoring in Google Cloud Platform.