Reality of the Cloud: 2011

Tuesday, February 1, 2011

Understanding Cloud Basics - PAAS

In the last entry we saw how IAAS is provided by the service provider and the agility it brings to IT infrastructure. In this post aim is to go a notch higher and see how platform as a service works and simplifies developers life.
One thing everyone should understand is its all about business, PAAS is important because developers can focus on building apps which is there core functions and not worry about the systems.
PAAS is platform as a service. It includes the following features -
a) Data management
b) Web management
c) Application Management
In essence all the development system stuff is encapsulated or automated and is fully transparent to the end developer.
Let's take an example to understand what goes in to building a PAAS
A typical 3-tier deployment to build a PAAS is as follows -

PAAS Providers -

However when you use PAAS provider like google app engine, windows Azure, these complexities are hidden from you.
All you have need to do is deploy your code. Ideally PAAS characteristics are as follows -
1) Auto Scale
2) Self Healing
3) Pay as you use
(Infact Amazon's beanstalk is free, you pay only for underlying resources)

Following are few the PAAS providers in cloud computing -
1) Cumulogic - Java PAAS
2) Cloud Bees - Java PAAS
3) OrangeScale - Google app engine, windows Azure
4) Amazon AWS - Beanstalk and RDS
5) Heroku - Ruby on rails PAAS
6) Google app engine - Python PAAS
7) CA 3tera applogic - PAAS enabling cloud software.

I hope this is easy to understand PAAS introduction, look out for SAAS in the next entry

Monday, January 31, 2011

Understanding Cloud Basics - Infrastructure As a Service

I have been getting multiple request to explain cloud in a very basic way and in steps.
So Starting off with this post we will try to understand very basics of the cloud computing - Infrastructure as a service.
Infrastructure as a service simply put is provisioning and using infrastructure on the fly and pay per usage of the infrastructure. The infrastructure includes the following things -
1) Servers, 2) Operating system, 3) VLANs, 4) LAN, 5) WAN, 6) Firewall and may be more.
A traditional IT infrastructure consists of the following things -
1) Physical Servers, 2) Network Switches, 3) Firewalls.
A typical provisioning of the infrastructure involves the following activities -
1) Order Physical servers, switches and firewalls
2) Rack up Physical servers.
3) Setup server network
4) Setup intranet connectivity
5) Setup firewall network
6) Setup internet connectivity
7) Install operating systems on each server
Unless you have an automation software, most of the above activities are manual and take a good amount of time to get it done. Let's look at typical time lines for the same -
Sr No Activity Time to accomplish Comments
1 Hardware ordering 3-4weeks Typical delivery time
2 Rack and power 1 day
3 Set up server 1 day
4 Set up intranet 1-2 day
5 Set up firewal 1-3 days
6 Set up internet 1-2 days

The highest amount of time is in hardware to arrive.

When you switch to cloud the activity time reduces to follow assuming the service provider is already selected -
Sr No Activity Time to Accomplish
1 Sign up the account 30min-1Hr
2 OS Image creation 2-3Hrs
3 Setup security 2-3Hrs

So maximum within a day your setup is ready and when you want to scale this time further goes down to not more then 30 min as the base setup is ready and re-usable.

So how does the magic happen?
There is no magic. Its a combination of things below -
1) Service Provider
2) Provisioned hardware capacity
3) Virtualized infrastructure
4) Clouded APIs :)

What happens in the background is as follows -
Service provider builds cloud services as follows -

And there you have it. Check out the following clouds -
http://aws.amazon.com/products/

There are many more Infrastructure as a service provider but above gives you an idea of using cloud for infrastructure as a service.

Feel free to comment on the post if you had like to see more details on this topic.

Monday, January 24, 2011

Applications have to be build stateless to scale and shrink on cloud

Context of the presentation
Cloud is successful with Pay-per-use model in software, because it invented elasticity:

* If 2 companies in opposite time zones need 10 machines during day time and 1 machines during night time, a cloud provider can serve both the companies with 11 machines - 10 for one and 1 for another company at any given point in time.
* If each had run them in their enterprise, they would have bought 10 machines each (20 in total). 9 machines would have been idle during night time.
* Time zone is not the only reason for sharing idle capacity:

o Computing need could also be seasonal. Some business might need very high computing capacity during christmas, while some others might need it during financial year closure and so on.
o Some may not be able to predict when they need that capacity. e.g. slashdot effect. Either way, by sharing, you let others use your idle capacity. This enables Pay-per-use.
* You can extend the same concept to the cost of the platforms - application servers, database and applications.

The most important point about elasticity is that - 9 machines that are serving a company during the day time will shrink to 1 machine in the night automatically based on load. The 8 machines that are released will then start serving the other company along with 1 other machine. This property of cloud that allows these 2 tenants (or more) to share processing capacity is called shared-process multitenancy.

So why not stateful architecture?
Lets just say 1000 users are distributed across these 10 machines during day time i.e. 100 users in each machine. In a stateful architecture, the 100 users will be served only the server that created the session for the users during login. This happens because the session state is stored in the memory of that server. This is done by the http load balancer and is called session stickiness.
When night falls, lets assume that 900 users are logging out of the system, while the rest of the users (i.e. 100 users) are still logged in. Ideally, 1 machine should be good enough to serve all of them. However, these users could be distributed across all the 10 servers with 10 users each. So shrinking back to 1 machine is not possible i.e. it breaks elasticity.
One way to solving this problem is to replicate session state across all the 10 servers. This way the user can be served by any server in the 10. But each server will occupy 10x times memory than having just it's users session state. This reduces the usefulness of the servers as more servers are added to the cluster when more users are added to the system. So this doesnt work for exploding number of tenants, who sign-up for your application in shared-process multitenancy.

How does this change with statelessness?
With stateless application, you can execute user request anywhere - session stickness is no longer relevant. So when the number of users reduce from 1000 to 100. You can immediately release 9 servers to the other company and serve the 100 users from 1 server.
It sounds simple. But in practice it is not. All applications needs state. So if the application server is not going to handle it, someone else has to do that. And that will be the database. Current databases has enough scaling problems already. So adding state management to it woes is a non-starter and hence NoSQL "distributed" datastores.

You would see this as the common architectural pattern across PaaS vendors:

* Google App Engine - stateless requests + Big Table
* Microsoft Azure - web role + Azure storage/SQL
* Of course, my company - OrangeScape that runs on top of GAE/Amazon EC2.
* and i would expect the same from VMforce - Spring stateless session beans + Force.com DB

So build for Cloud! Stateful apps with SQL databases are for oldies listening to "enterprise" tunes. (Sorry! couldnt resist it.)

Source - http://manidoraisamy.blogspot.com/2010/07/why-does-elastic-nature-of-cloud-impose.html

Thursday, January 20, 2011

Amazon Launches PAAS as Beanstalk!

Service Highlights

Easy to begin – Elastic Beanstalk is a quick and simple way to deploy your application to AWS. Use the AWS Management Console or an integrated development environment (IDE) such as Eclipse to upload your application, and Elastic Beanstalk automatically handles the deployment details of capacity provisioning, load balancing, auto-scaling, and application health monitoring. Within minutes, your application will be ready to use without any infrastructure or resource configuration work on your part.

Impossible to outgrow – Elastic Beanstalk automatically scales your application up and down based on default Auto Scaling settings. You can easily adjust Auto Scaling settings based on your specific application's needs. For example, you can use CPU utilization to trigger Auto Scaling actions. With Elastic Beanstalk, your application can handle peaks in workload or traffic while minimizing your costs.

Complete control – Elastic Beanstalk lets you "open the hood" and retain full control over the AWS resources powering your application. If you decide you want to take over some (or all) of the elements of your infrastructure, you can do so seamlessly by using Elastic Beanstalk's management capabilities. For example, you can browse log files, monitor application health, adjust auto-scaling rules, setup email notifications, and even pass environment variables through the Elastic Beanstalk console.

Flexible – You have the freedom to select the Amazon EC2 instance type that is optimal for your application based on CPU and memory requirements, and can choose from several available database options. For example, you can specify a deployment consisting of high-memory instances if your web application has a large memory footprint.

Reliable – Elastic Beanstalk runs within Amazon's proven network infrastructure and datacenters, and provides an environment where developers can run applications requiring high durability and availability.
↑ Top
Pricing

There is no additional charge for Elastic Beanstalk – you only pay for the underlying AWS resources (e.g. Amazon EC2, Amazon S3) that your application consumes.

New AWS customers who are eligible for the AWS free usage tier can deploy an application in Elastic Beanstalk for free, as the default settings for Elastic Beanstalk allow a low traffic application to run within the free tier without incurring charges. If these applications require more resources than the default environment provides, customers will be charged the normal AWS rates for the incremental resources the application consumes.

The costs of running a web site using Elastic Beanstalk can vary based on several factors such as the number of EC2 instances needed to handle your web site traffic, the bandwidth consumed by your application, and which database or storage options your application uses. The principal costs for a web application will typically be for the EC2 instance(s) and for the Elastic Load Balancing that balances traffic between the instances running your application.

The table below is an example which shows the monthly costs of running a low traffic web site using the Elastic Beanstalk default settings, both with and without the AWS free tier:
Service and Resource Unit Cost Breakout Cost
Amazon EC2 t1.micro instance 1 $0.02/hr * 24 hours * 30 days $14.40
Elastic Load Balancer 1 $0.025/hr * 24 hours * 30 days $18.00
Elastic Load Balancer Data Processing 15GB $0.008/GB * 15GB $ 0.12
Elastic Block Store volume 8GB $0.10/GB * 8GB $ 0.80
S3 Storage for WAR File and Access 1GB $0.14/1GB + $0.01 for<1k PUTs, <10k GETs $ 0.15
Bandwidth In and Out 15GB 15 GB in * $0.10, 15 GB out * $0.15 $ 3.75
Total Monthly Cost without Free Tier $37.22
Total Monthly Cost with Free Tier $0

Source - http://aws.amazon.com/elasticbeanstalk/

Hadoop Summit coming to Bangalore, India

Yahoo! India R&D is proud to announce the Hadoop India Summit 2011, a one-day event which will take place in Bangalore, India, on February 16th. Last year’s event was a fantastic success with 300+ researchers and industry leaders discussing real-world use cases for Hadoop. This year's event is planned to be bigger and better, with 500+ attendees expected. You will have the opportunity to learn and hear from experts working on large data storage and analysis problems across industries such as financial services, telecommunications, government, retail and academia.

Apache Hadoop has become the de-facto platform for developing large-scale, data-intensive applications. It has been used actively in academia and Industry for research and data mining. Our Hadoop Summit will provide an opportunity to understand the latest trends and roadmap for the Hadoop platform and its ecosystem, and how Hadoop is leveraged in various domains. Whether you are already developing Hadoop-based applications or exploring how to adopt Apache Hadoop for your business, don't miss this opportunity to learn about interesting, relevant real-world applications as well as the latest research.

Please note that attendance to this event is by registration only and space is limited. Please kindly register now on Eventbrite to reserve a place.

* What: Hadoop Summit Conference 2011
* Registration: http://hadoopindiasummit.eventbrite.com/
* When: February 16th, 2011, 8:30 AM to 6:00 PM IST
* Where: J N Tata Auditorium, Indian Institute of Science, Bangalore (View Larger Map)
* Contact: yahoohadoop@yahoo-inc.com

Agenda

08:30 AM - 09:00 AM Registration and Coffee
09:00 AM - 09:15 AM Welcome Speech
Hari Vasudev - VP, Cloud Platform Group, Yahoo!
09:15 AM - 09:45 AM Keynote Address
Todd Papaioannou - VP, Cloud Architecture, Yahoo!
09:45 AM - 10:15 AM Keynote Address
Prof. D. Janakiram - Department of CSE, Indian Institute of Technology (IIT), Madras
10:15 AM - 10:45 AM Keynote Address
Sundara Nagarajan - Director of R&D, Storage Works Division, HP
10:45 AM - 11:00 AM Coffee Break
11:00 AM - 11:30 AM Keynote Address
Sanjay Radia - Cloud Architect, Yahoo!
11:30 AM - 12:00 PM Scaling Hadoop
Dr. Milind Bhandarkar - LinkedIn
12:00 PM - 12:15 PM Lightning talk #1
12:15 PM - 12:30 PM Lightning talk #2
12:30 PM - 01:15 PM Lunch Break
Platform Track Application Track Research Track
01:15 PM - 01:45 PM Hadoop NextGen
Sharad Agrawal - Yahoo! Data Analytics with Hadoop
Michael McIntire - Yahoo! Middleware Frameworks for Adaptive Executions and Visualizations of Climate and Weather Applications on grids
Dr. Sathish Vadhiyar - IISc Bangalore
01:45 PM - 02:15 PM Pig, Making Hadoop Easy
Alan Gates - Yahoo! Making Hadoop Enterprise ready with Amazon Elastic Map/Reduce
Vivek Ratan - Amazon Comparison between Extension of Fairshare Scheduler and
a Novel SLA based Learning Scheduler in Hadoop
Dr G Sudha Sadasivam , N Priya - PSG Tech, Coimbatore
02:15 PM - 02:45 PM Data on Grid (GDM)
Venkatesh S - Yahoo! Hadoop Avatar at EBay
EBay VirtPerf: A Capacity Planning Tool for Virtual Environment
Dr. Umesh Bellur - IIT, Bombay
02:45 PM - 03:15 PM Hive
Namit J - Facebook Feeds processing at Yahoo! : One Hadoop, one platform, 2 systems
Jean-Christophe Counio - Yahoo! Scheduling in MapReduce using Machine Learning Techniques
Dr. Vasudev Varma - IIIT Hyderabad
03:15 PM - 03:30 PM Coffee Break
03:30 PM - 04:00 PM Making Hadoop Secure
Devaraj Das - Yahoo! Hadoop 101
Basant Verma - Yahoo! DRDO Labs
04:00 PM - 04:30 PM Simulation and Performance
Ranjit - Yahoo! Searching Information Inside Hadoop Platform
Abinash - Director of Technology, Bizosys Technologies Provisioning Hadoop’s Map Reduce in Cloud for Effective Storage as a Service
Dr. Shalinie S.M. - TCE Madurai
04:30 PM - 05:00 PM Oozie - Workflow for Hadoop
Andreas N - Yahoo! Data Integration on Hadoop
Sanjay Kaulskar - Informatica Framework for a suite of algorithms for predictive modeling on Hadoop
Vaijanath Rao, Rohini Uppuluri - AOL India
05:00 PM - 05:45 PM Panel Discussion
05:45 PM - 06:00 PM Closing & Coffee

Source - http://developer.yahoo.com/blogs/ydn/posts/2011/01/hadoop-india-summit-2011/