Scalability is the ability to support the required quality of service as the system load increases without changing the system. A system can be considered scalable if, as the load increases, the system still responds within the acceptable limits.
The most important principle for building scalable systems is that, if you need your system to exhibit this characteristic, you have to design it in up front.

To understand scalability, you must first understand the capacity of a system, which is defined as the maximum number of processes or users a system can handle and still maintain the quality of service. If a system is running at capacity and can no longer respond within an acceptable time frame, then it has reached its maximum scalability.

There is no one size fits all prescriptive approach that will solve your scalability problems.

To scale a system that has met capacity, you must add additional hardware. This additional hardware can be added vertically or horizontally.

Vertical scaling

Vertical scaling, also described as scale up, typically involves adding additional processors, memory, or disks to the current machine(s). Generally, this form of scaling employs only one instance of the operating system.

Such vertical scaling of existing systems also enables them to use virtualization technology more effectively, as it provides more resources for the hosted set of operating system and application modules to share.

Taking advantage of such resources is called “scaling up”, such as expanding the number of Apache daemon processes currently running.

If you need scalability, urgently, going vertical is probably going to be the easiest.

You cannot scale up beyond certain limit

Horizontal scaling

Horizontal scaling, or scale out, usually refers to tying multiple independent computers together to provide more processing power, thus increasing the overall system capacity.

Horizontal scaling typically implies multiple instances of operating systems, residing on separate servers, such as adding a new computer to a distributed software application.

An example might be scaling out from one (1) Web server system to three (3).

Most clustering solutions, distributed file systems; load-balancers help you with horizontal scalability.

Hundreds of small computers may be configured in a cluster to obtain aggregate computing power.

Horizontal scalability isn’t cheap. The application has to be built ground up to run on multiple servers as a single application.

The architecture you create must be able to handle the vertical or horizontal scaling of the hardware. Vertical scaling of software architecture is easier than the horizontal scaling. Why? Adding more processors or memory typically does not have an impact on your architecture, but having your architecture run on multiple machines and still appear to be one system is more difficult.

Generally, three techniques can be employed to improve scalability:

• Increase the resources (bigger machines, more disks, and more memory).

• Improve the software.

• Increase the resources and improve the software.

Although the long-term answer is the third technique, our bias is toward the second. Good design at the beginning of a project is the most cost-effective way to improve scalability. No doubt you will need greater resources to deal with higher demands, but this is never the whole story. Although it can take the purchaser part of the distance, throwing money at the problem cannot ensure scalability. I don’t deny the need to spend money at certain points in the process. Rather, I suggest strategic places to spend and strategic opportunities during the design that can give application designers the biggest bang for their buck, thereby reducing their need to purchase more resources than necessary.

The fact is that millions of people use the Internet every day. That is obvious and the increasing numbers are the primary reason that application architects worry about things like scalability in the first place.

A primary metric for scalability is throughput, which measures transactions or users per second. There is no such thing as infinite scalability—the ability to handle arbitrary demand. Every application has its limits. In fact, for many deployments it is satisfying to achieve just linear scalability, although the optimizer in all of us wants to achieve much better than that. Not unexpectedly, the most successful examples of scalability are those that simply minimize that which new resources are required.

To enhance scalability, data should be as close as possible to its point of use. For this reason, caching is often implemented to improve the speed of data access. For Web 2.0 applications where caching is a key to scaling, new breeds of caches have been developed, including application-level, distributed caches (such as Memcached) or distributed caching file systems (Such as Hadoop, Lustre, and MogileFS).


Synchronization blocks reduce scalability of applications by forcing serialization of all concurrent threads of execution. Setting up synchronization also requires a nontrivial amount of resources from the JVM. Therefore, serialized blocks should be kept small and used only when absolutely needed.


The same rule applies to using standard Java classes. The JDK usually provides at least two implementations for each data structure, one with synchronization and one without (e.g., Vector is synchronized and Listarray is not). When using JDK classes, pick synchronized implementations only when synchronization is actually needed; use unsynchronized versions in all other cases.


Use clustering to increase scalability: As long as your application doesn’t require the servers in your cluster to communicate with each other, adding more servers keeps latency low while increasing throughput.


For a successful scalable web application, all layers have to scale in equally which includes the storage layer (Clustered file systems, s3, etc), the database layer (partitioning, federation), application layer (memcached, scaleout, terracota, tomcat clustering, etc), the web layer, load balancer, firewall, etc

For example if you don’t have a way to implement multiple load balancers to handle your future web traffic load, it doesn’t really matter how much money and effort you put into horizontal scalability of the web layer. Your traffic will be limited to only what your load balancer can push.

In web applications scalability is generally related to the increasing amount of web users or visitors. Alternatively this can be caused by an increasing amount of data. At some point one of the resources reaches hardware limits which are called – a bottleneck.

In reality – very often limit is reached because software is not using resources efficiently. Systems designed in a better way would not demand so many resources and would have an improved overall performance.

Making a system scalable need to have well a defined cause and expected effect, otherwise it’s a waste

Some of the ways to improve scalability are

–       Connect to database only when absolutely needed

–       Cache the data

–       Data should be as close as possible to its point of use

–       Serialized blocks should be kept small and used only when absolutely needed.

–       Use asynchronous processing as far as possible

–       Make network calls only when absolutely needed

–       Pool expensive resources

I don’t pretend to have the required knowledge to cover this topic at length. However, I do have some exposure I should share what I learned so others can benefit from my experience