Everything you need to know about System Design

11 min readJul 30, 2021

Nowadays, most of the tech companies are keeping System Design round in their interview process to understand the ability of the candidate to build complex, scalable and distributed systems. System design is open ended process, mean that there is no right or wrong way, just that system should be efficient enough to meet the desired requirements and scale.

What is system design?

System design is process of architecting the components of a system such as modules, different interfaces of components, databases and defining the data flow between different components.
Before jump in and start designing systems, there are few basic things which needs to be understood.

1. Load balancing

It is the way of efficiently distributing traffic across the group of backend servers or server pool. This ensures that there is no single server/application bears too much traffic. It also increases the responsiveness and availability of the application. Load balancers will help in achieving load balancing across multiple servers. A load balancer performs the following functions:
1. Distributes client requests or network load efficiently across multiple servers
2. Ensures high availability and reliability by sending requests only to servers that are online
3. Provides the flexibility to add or subtract servers as demand dictates

Different load balancing algorithms provide different benefits; the choice of load balancing method depends on your needs:

Round Robin — Requests are distributed across the group of servers sequentially.
Least Connections — A new request is sent to the server with the fewest current connections to clients. The relative computing capacity of each server is factored into determining which one has the least connections.
Least Time — Sends requests to the server selected by a formula that combines the fastest response time and fewest active connections.(Exclusive to NGINX Plus)
Hash — Distributes requests based on a key you define, such as the client IP address or the request URL.
IP Hash — The IP address of the client is used to determine which server receives the request.
Random with Two Choices — Picks two servers at random and sends the request to the one that is selected by then applying the Least Connections algorithm

2. Caching

It is a technique that stores copies of frequently used application data in a layer of smaller, faster memory in order to improve data retrieval times, throughput, and compute costs.
Caching is used to speed up a system. To improve the latency of a system, we need to use caching. To reduce network requests can also be a cause for using caching. It helps in reducing the-network calls to the database. Some popular caching services are Memcache, Redis, and Cassandra.

Client level caching can be done so that client does not need to request to server-side. Similarly, the server may also use a cache. In that case, the server does not need to always hit the database for fetching data. We can have a cache in between two components also.

When to use caching?

Caching is most helpful when the data you need is slow to access, possibly because of slow hardware, having to go over the network, or complex computation.
Caching is helpful in systems where there are many requests to static or slow to change data, because repeated requests to the cache will be up to date.
Caching can also reduce load on primary data stores, which has the downstream effect of reducing service costs to reach performant response times.

Ultimately the features of a successful caching layer are highly situation-dependent. Creating and optimizing a cache requires tracking latency and throughput metrics, and tuning parameters to the particular data access patterns of the system.

When not to use caching?

Caching isn’t helpful when it takes just as long to access the cache as it does to access the primary data.
Caching doesn’t work as well when requests have low repetition (higher randomness), because caching performance comes from repeated memory access patterns.
Caching isn’t helpful when the data changes frequently, as the cached version gets out of sync and the primary data store must be accessed every time.

When requested data is found in the cache, it’s called a Cache Hit. When the requested information is not found in the cache, it negatively affects a system. It’s called a Cache Miss. It is a measure of poor design. We need to increase the number of hits and decrease the miss rate for performance improvement.

Cache Invalidation — We usually use the cache as a faster data source that keeps a copy of the database. In the case of data modification in DB, if the cache contains the previous data, then that is called stale data. So, we need to have a technique for invalidation of cache data. Else, the application will show inconsistent behavior. As cache has limited memory, we need to update data stored in it. This process is known as Cache Invalidation.

3. Databases

Choosing the right database is always an important decision. But what are the factors which can help us choose? First of all, what kind of data are we working with? Is it records or file systems or audio/video content? And what kind of processing do we intend to do on that data? Do we need to search for something or run sophisticated analytics algorithms?

Before that lets try to understand different types of databases:

Relational Database

A relational database is a type of database that stores and provides access to data points that are related to one another. Relational databases are based on the relational model, an intuitive, straightforward way of representing data in tables. In a relational database, each row in the table is a record with a unique ID called the key. The columns of the table hold attributes of the data, and each record usually has a value for each attribute, making it easy to establish the relationships among data points.
Relational databases use Structured Query Language (SQL).
Relational databases work best when the data they contain doesn’t change very often, and when accuracy is crucial. Relational databases are, for instance, often found in financial applications.
If the information is structured and can be represented as a table, and if we need our transactions to be atomic, consistent, isolated, and durable (ACID), we go with a relational database
Popular examples of Relational Databases are MySQL, Oracle, SQLite, Postgres etc.

2. Non-Relational Database

Non-relational databases (often called NoSQL databases) are different from traditional relational databases in that they store their data in a non-tabular form. Instead, non-relational databases might be based on data structures like Documents, Key-Value, Column-Family and Graph etc... This ability to digest and organize various types of information side-by-side makes non-relational databases much more flexible than relational databases.

Non-relational databases are often used when large quantities of complex and diverse data need to be organised.

Non-relational databases often perform faster because a query doesn’t have to view several tables in order to deliver an answer, as relational datasets often do. Non-relational databases are therefore ideal for storing data that may be changed frequently or for applications that handle many different kinds of data. They can support rapidly developing applications requiring a dynamic database able to change quickly and to accommodate large amounts of complex, unstructured data.

There are 4 major types of non-relational DBs

Key-Value based: store data in an array of key-value pairs. Key serves as a unique ids linked to the value we want to store. A value can typically only be retrieved by referencing its key. Common examples are Redis which is an in-memory key-value database and Dynamo.
Document based: store data in documents similar to JSON objects. Common and most popular example is MongoDB where data is stored in BSON document based format.
Graph based: store data in nodes (as entities) and edges (as relation/information about entity). Common example is Neo4j.
Wide-column based: store data in tabular format but unlike relational databases, we have dynamic columns. Common and most popular examples are HBase and Cassandra

CAP Theorem

The CAP Theorem says that any distributed database can only satisfy two of the three features:

Consistency: every node responds with the most recent version of the data
Availability: any node can send a response
Partition Tolerance: the system continues working even if communication between any of the nodes is broken

We live in a physical world and can’t guarantee the stability of a network, so distributed databases must choose Partition Tolerance. This implies a tradeoff between Consistency and Availability. In other words if part of the cluster goes down, the system can either satisfy Consistency, roll back any unfinished operations and wait to respond until all nodes are stable again. Or it can satisfy Availability, continue responding but risk inconsistencies.

A system is called a CP database if it provides Consistency and Partition Tolerance, and an AP database if it provides Availability and Partition Tolerance.

4. Sharding

Modern software systems generate or collect data which is lot more than can handled by a single database system. Certainly, the capacity of the database can be increased but eventually it leads to the physical limits of the server or higher costs in case of cloud environments. The alternative approach would be splitting up data across multiple servers, mean that distributing the data collected or generated across cluster of database servers.

What is Sharding or Data Partitioning?

It is a database design architecture pattern in which we split large datasets into smaller chunks of data ( logical splitting) and we store/distribute these shards across different databases/servers. Each that chunk is called as a “Shard” and the schema remains same as the original database.

A shard is a horizontal partition, meaning the database table is split up by drawing a horizontal line between rows. This is in contrast to a vertical partition, where partitions are made between columns. The figure below shows horizontal partition and vertical partition.

With the help of sharding, database can serve more traffic because any server in the cluster has to serve only a fraction of total number of requests.

Advantages of sharding:

Scalability
High Availability
Decreases latency of query
Increases bandwidth (can handle more write connections)

Disdavantages

Joining data from multiple shards will be expensive operation comparatively
Introduces complexity in the system design

When to use sharding

The benefits of a sharded architecture, versus other kinds of database partitioning are:

Leveraging average hardware instead of high end machines
Quickly scaling by adding more shards
Better performance because each machine is under less load

Sharding is particularly useful when a single database server:

Can’t hold all the data
Can’t compute all the query responses fast enough
Can’t handle the number of concurrent connections

You might also need sharding when you need to maintain distinct geographic regions, even if the above compute constraints haven’t been hit. Either your service will be faster when the data servers are physically closer to the users, or there’s legislation about data location and usage in one of the countries your service operates in.

When not to use sharding

The disadvantages of database sharding are all about complexity. The queries become more complex because they have to somehow get the correct shard key, and need to be aware of avoiding multi-shard queries.

If the shards can’t be entirely isolated, you need to implement eventual consistency for duplicated data or upholding relational constraints. The implementation and deployment of your database get a lot more complex, as do failovers, backups, and other kinds of maintenance. Essentially — you should only use database sharding when you absolutely have to.

5. Pub — Sub and Queues

Publish/subscribe messaging, or pub/sub messaging, is a form of asynchronous service-to-service communication used in serverless and microservices architectures. In a pub/sub model, any message published to a topic is immediately received by all of the subscribers to the topic. Pub/sub messaging can be used to enable event-driven architectures, or to decouple applications in order to increase performance, reliability and scalability.

Message queues are a kind of messaging-oriented middleware where producers push new messages to a named First-In, First-Out (FIFO) queue, which consumers can then pull from. Message queues are also called point-to-point messaging because there is a one-to-one relationship between a message’s producer and consumer. There can be many producers and consumers using the same queue, but any particular message will only have one producer and one consumer.

Both Queues and Pub-Sub mechanisms allow applications to process messages asynchronously. In this architecture sender (publisher) and receiver (subscriber) of a message to work independently, by providing middleman. This can eliminate bottlenecks and help the system to operate more efficiently.

Which to use?

Whether to use queues or pub-sub depends mostly on how many message consumers the system has. If a message needs to have only one consumer, then the message queue is the right approach. If a message needs to have possibly many consumers that get a copy, the pub-sub approach is best.

Implementations also differ in the particular message-passing guarantees they provide. Some feature variations to note are:

Persistence: the messages are saved to persistent storage so they can be recovered in case the message broker itself goes down.
Replays: the messages are stored even after they are consumed so a service can “replay” the message stream in failure cases to recover properly.
Ordering: the messages always arrive to the consumer in a particular order.

Pub-sub example

Many social networks already use parts of the pub-sub model and call it “following” users. Let’s look at an example to build a better intuition. Imagine a simple social network allows people to share recipes, follow their friends, and see a timeline of their friends’ recipes.

When a user shares a recipe, they can put it in a topic. One user might categorize by what meal it is, another user might categorize by the season of the ingredients. When a user follows another user, they are subscribing to the recipes their friend publishes.

Followers can choose to see everything that’s published, or only some topics that they’re interested in. Followers can also add their own content filters, like excluding recipes that use certain ingredients.

Users can follow as many other users as they want, so their timeline will be full of recipes from many users, but each recipe only comes from one publishing user. Similarly, a user can be followed by many other users, and all of the followers will see a copy of the recipe on their timeline.

Message queue example

Now let’s go over a simple example of a message queue. Imagine a website that sells a high volume of T-shirts. Customers make online orders for a T-shirt and the web server produces corresponding order requests and places them on a queue for processing in the backend.

Depending on demand and availability, there could be one or several computers working to fulfill the orders. More order fulfillment servers might be needed if it’s a peak demand time of year, or if the fulfillment process for some orders is more complex.

Each order only needs to be processed once, so only one of the order fulfillment servers needs to take the order request off the queue. If for some reason multiple fulfillment servers got a copy of an order, that order would be mistakenly fulfilled multiple times.

In summary, using a queue enables the order-taking and order-fulfilling logic to be separated, and ensures that each order will be fulfilled exactly once.

This isn’t all!!!! . Other than these concepts you have to learn about Web servers, Proxies, Network protocols and Availability

This article will help you to get basic level of knowledge and help you to speak fluently in system design interview.

Once you understood the basics, I would really recommend you to go through the this link which contains detailed explanation about system design concepts. Most of the definitions and explanations are picked from this particular website.

There are lot of system design interview question answered in this link: https://www.geeksforgeeks.org/top-10-system-design-interview-questions-and-answers/