Monday, January 24, 2011

Applications have to be build stateless to scale and shrink on cloud

Context of the presentation
Cloud is successful with Pay-per-use model in software, because it invented elasticity:

* If 2 companies in opposite time zones need 10 machines during day time and 1 machines during night time, a cloud provider can serve both the companies with 11 machines - 10 for one and 1 for another company at any given point in time.
* If each had run them in their enterprise, they would have bought 10 machines each (20 in total). 9 machines would have been idle during night time.
* Time zone is not the only reason for sharing idle capacity:

o Computing need could also be seasonal. Some business might need very high computing capacity during christmas, while some others might need it during financial year closure and so on.
o Some may not be able to predict when they need that capacity. e.g. slashdot effect. Either way, by sharing, you let others use your idle capacity. This enables Pay-per-use.
* You can extend the same concept to the cost of the platforms - application servers, database and applications.

The most important point about elasticity is that - 9 machines that are serving a company during the day time will shrink to 1 machine in the night automatically based on load. The 8 machines that are released will then start serving the other company along with 1 other machine. This property of cloud that allows these 2 tenants (or more) to share processing capacity is called shared-process multitenancy.

So why not stateful architecture?
Lets just say 1000 users are distributed across these 10 machines during day time i.e. 100 users in each machine. In a stateful architecture, the 100 users will be served only the server that created the session for the users during login. This happens because the session state is stored in the memory of that server. This is done by the http load balancer and is called session stickiness.
When night falls, lets assume that 900 users are logging out of the system, while the rest of the users (i.e. 100 users) are still logged in. Ideally, 1 machine should be good enough to serve all of them. However, these users could be distributed across all the 10 servers with 10 users each. So shrinking back to 1 machine is not possible i.e. it breaks elasticity.
One way to solving this problem is to replicate session state across all the 10 servers. This way the user can be served by any server in the 10. But each server will occupy 10x times memory than having just it's users session state. This reduces the usefulness of the servers as more servers are added to the cluster when more users are added to the system. So this doesnt work for exploding number of tenants, who sign-up for your application in shared-process multitenancy.

How does this change with statelessness?
With stateless application, you can execute user request anywhere - session stickness is no longer relevant. So when the number of users reduce from 1000 to 100. You can immediately release 9 servers to the other company and serve the 100 users from 1 server.
It sounds simple. But in practice it is not. All applications needs state. So if the application server is not going to handle it, someone else has to do that. And that will be the database. Current databases has enough scaling problems already. So adding state management to it woes is a non-starter and hence NoSQL "distributed" datastores.

You would see this as the common architectural pattern across PaaS vendors:

* Google App Engine - stateless requests + Big Table
* Microsoft Azure - web role + Azure storage/SQL
* Of course, my company - OrangeScape that runs on top of GAE/Amazon EC2.
* and i would expect the same from VMforce - Spring stateless session beans + DB

So build for Cloud! Stateful apps with SQL databases are for oldies listening to "enterprise" tunes. (Sorry! couldnt resist it.)

Source -

No comments:

Post a Comment