Comparing Features Physical/Virtual/Cloud Environments

I’m preparing a talk called Can Cloud Computing Save the World? so I have been doing some research around the comparison of features between physical servers, virtual servers, and cloud services.  To a certain extent this data collection is vulnerable to individual bias so I wanted to get some feedback to see if I’m missing any key points, over emphasizing other points, etc.

Physical Servers

Costs Initial outlay for hardware can be large.  However can be more cost effective in the long term compared to hosting costs.Requires rack space/power/network/cooling which you’re either paying to have hosted or paying for yourself.
Availability Depends on the facility.  Quality hardware does not fail often (redundant power supplies, redundant storage, etc).  Most common cause of downtime is power reliability.In theory the OS on traditional hardware has same reliability as virtualized OS.  Reliability of OS compared to OS running your cloud infrastructure is debatable.
Scalability Expensive and time consuming to scale out, Easy to scale up to a certain point but can be time consuming.
Manageability Still have to manage everything (hardware, OS, application, physical environment)

Virtual Servers

Costs Should be less than straight physical servers because of consolidation.Licensing for VM software can be expensive.Still has infrastructure dependencies of Physical servers.
Availability Adds dependency of virtualization software which could increase downtime.  However virtualization software is generally very stable.In a HA virtualization environment reliability can be increased because virtual machines can move to new hosts in event of hardware failures.
Scalability Can scale up virtual machine quickly if resources available on host machine (e.g. memory can be added instantly)Provides leeway in scaling out hosts (add host, move virtual machines to host to spread out load)Requires large investment to provide most flexibility (e.g. SAN required for moving of VM’s between hosts)
Manageability Templates simplify deployments of new VM’s compared to deployments to hardware Must manage virtualization technology Decrease the amount of hardware to manage

Still have to manage everything (hardware, OS, application, physical environment)

Virtual Server Solutions

VMware ESX Provides most feature rich solution.Well probably be most expensive solution.Must learn VMware OS and Application.
Microsoft Hyper-V Comes free(ish) with Windows Server 2008Live migration capabilities cause network connections to drop while guest moves.  Windows 2008 R2 will address this issue.Really good fit for Microsoft shops.

Cloud Computing

Costs Costs start very attractive.  You only pay for what you use.More expensive over the long term than other solutions.Pricing model fits fluctuating needs the best.  Predictable needs will get better costs from other solutions.
Availability Depends on cloud provider.Depending on application will add dependency on working internet connection.  Problematic for many line of business apps.  Not a problem for web based companies.Longevity of cloud provider is a concern.  May want to architect/look for solution that allows easy transport to new provider or physical/virtual solution.
Scalability This is the strong suite of cloud computing.  Can scale up number of instances hosting your application very quickly.Must be conscientious of architecting and developing application to leverage scalability features of cloud provider.
Manageability Depends on cloud provider.  Goes from very little management overhead to managing entire OS stack.

Cloud Computing Solutions

Microsoft Azure Strong candidate for .Net & PHP Programmers.IIS cloud based provider.  No OS Management.Costs have not been announced.

Strong development suite in Visual Studio, including local cloud fabric for development.

Azure fabric will be available for companies to host themselves.

Storage, Relation Database (MS SQL Server), and Non-Relation Database part of solution.

Worker Role provides non-web based processing (e.g. Scheduled Background tasks)

Google App Engine Strong candidate for Python and Java Programmers.Free for smaller apps.  Charge on larger usage.No background processing.

Storage and Non-Relation Database part of solution.

Amazon Web Services OS based cloud provider.  Full access to OS (Windows or Linux)Costs are complicated.  Range from $27/month pre-paid per small instance to $2300/month for on-demand Extra large instance.Storage, Content Distribution, Non-Relation Database, Message Queue, Monitoring, Auto-Scaling, Load Balancing all part of solution and all have associated costs.

Most flexible solution in what you want to do, but most work to setup and maintain.

Industry leading Cloud Solution.

GoGrid Lower costs than AWS, but lacks features of AWS.Can not save template of machine, can only use stock templates.Load balancing provided for free.

Relation databases only, on instance you host yourself in cloud.

Conclusion

In my opinion the debate between physical and virtual servers is trivial.  Go with a virtual server unless your load is so high you can not put multiple guests on a single piece of hardware.  95% of the time that rule of thumb holds true.

Choosing to go with the cloud comes down to some basic questions:

Flowchart

My flow chart is a little hard to read but if you save the JPEG file locally and view it you can zoom in better.  The basic questions are:

Can you not afford physical servers?  (if not then look at the cloud)

Do you have regulatory or security issues with your data being in the cloud (even if it’s encrypted)?  Does your application need to be more reliable than the internet (e.g. Accounting application for accounting firm)?  Can you easily predict what your load will be for the next 3 years?

If you answer yes to all three of those you probably do not want to use the cloud.  Otherwise take a look, it might be a very good fit for you.

Advertisements

Amazon Elastic Load Balancer Setup

As I previously wrote about, Amazon announced a load balancing solution called Elastic Load Balancer.  While this may prove to be a great addition to AWS currently none of the GUI tools (including the AWS Console provided by Amazon) have built in functionality to create ELB instances.

So I became motivated to finally get comfortable with the EC2 API, allowing me to call EC2 commands from my windows command line.  I wrote a post detailing how to setup your command line environment for the EC2 API here.

Now armed with a load balancing solution and a working windows command line I wanted to delve into ELB and see what it has to offer.

ELB Documentation

Amazon Web Services in general has excellent documentation.  ELB is no exception.  Probably the most important document you can read is the ELB Quick Reference Card.  This one page sheet shows you all the ELB related commands and their argument options.

ELB Architecture

First a quick overview of the architecture of ELB.  Think of an ELB instance as sitting in front of  your EC2 instances.  ELB routes traffic to your instances you register to be included with ELB.  The ELB instance has it’s own IP address and public DNS name.

As we can see from the diagram the load balancer directs traffic to different instances, even across different availability zones.

One thing to keep in mind is that the requests are balanced between different availability zones and then evenly between the instances of that zone.  So if you have 10 instances in us-east-1a and 5 instances in us-east-1b your us-east-1b instances will service twice as much traffic per instance.  For that reason it is suggested that you keep your number of instances in each zone roughly equal.

When you create the ELB instance it will give you the public DNS name for the instance.  That DNS name will remain the same for the life of the instance.  You will want to create a CNAME record in DNS to point your branded URL (www.mysite.com) to the “ugly” DNS name that EC2 provides you.

Creating ELB Instance

To create an ELB instance first ensure that your command line environment is configured to work with the EC2 API and the ELB API.  I suggest you read my previous article, Setting Up EC2 Command Line Tools on Windows, if you have never use an EC2 command line tool before.

The command for creating an ELB instance is elb-create-lb.  The parameters available on this command are:

<default> Name of Load Balancer; I suggest you use the DNS name of your public service you will be exposing through this ELB instance
–availability-zones Comma delimited list of zones to allow registered EC2 instances in
–listener “protocol=value, lb-port=value, instance-port=value” This defines which protocol and port the ELB instance will listen on, and which port on the instances to send the traffic to.You can have as many –listener parameters as you want.  For example you could configure an ELB instance to listen on ports 80 and 443.

First lets create an ELB instance to listen for HTTP traffic:

d:aws>elb-create-lb Test  –availability-zones us-east-1a,us-east-1b  –listener “protocol=http,lb-port=80,instance-port=80”

DNS-NAME  Test-1736333854.us-east-1.elb.amazonaws.com

As you can see it returns the public DNS name associated with this instance.

Here we create an ELB instance to listen for HTTP and HTTPS traffic:

D:aws>elb-create-lb Test –availability-zones us-east-1a,us-east-1b –listener “protocol=http,lb-port=80,instance-port=80” –listener “protocol=tcp,lb-port=443,instance-port=443”

DNS-NAME Test-851384903.us-east-1.elb.amazonaws.com

Notice on the protocols we specify HTTP for HTTP traffic, but TCP for HTTPS traffic.  HTTP and TCP are the only protocols supported.

Create CNAME Record for ELB Instance

When you create an ELB instance it provides you a public DNS name.  However they are not user friendly and you will want to create a CNAME record in DNS to redirect your friendly URL to your EC2 hosted website.

How you create the CNAME record depends on who is hosting DNS for you.  However here is the output of my test website I configured for this tutorial:

D:aws>nslookup

Default Server: ip-172-16-0-23.ec2.internal

Address: 172.16.0.23

>aws.LoudSteve.com

Server: ip-172-16-0-23.ec2.internal

Address: 172.16.0.23

Name: Test-5660601.us-east-1.elb.amazonaws.com

Address: 174.129.195.68

Aliases: aws.LoudSteve.com

If you delete your ELB instance and recreate it you will get a new public DNS name and will have to update your CNAME record.

Register EC2 Instance with Load Balancer

Now that you have an ELB instance you need to register EC2 instances with the load balancer.  The command to register an EC2 instance with the ELB instance is elb-register-instances-with-lb.  The parameters available on this command are:

<default> Name of Load Balancer instance to register EC2 instances with.
–instances Comma separated list of instance ID’s

First we need to get a list of our instances because we need the instance ID to register them with the ELB instance.  We do this with ec2-describe-instances from the EC2 API:

D:aws>ec2-describe-instances

<Lots of Stuff>

INSTANCE i-ed156e84   ami-da4daab3

<Lots of Stuff>

INSTANCE i-ef156e86   ami-da4daab3

<Lots of Stuff>

I removed quite a bit from the actual output to help with readability.  The part you want to focus on is where it says “INSTANCE i-**********”.  That is the information you need for each instance.

To register your instances you run the command elb-register-instances-with-lb:

D:aws>elb-register-instances-with-lb Test –instances i-ed156e84, i-ef156e86

INSTANCE-ID  i-ed156e84

INSTANCE-ID  i-ef156e86

You pass it the name of your ELB instance (Test in this case) and a comma separated list of the instance ID’s of your EC2 instances you this load balancer to route traffic to.

To de-register an instance you run the command elb-deregister-instances-from-lb:

D:aws>elb-deregister-instances-from-lb Test –instances i-ed156e84, i-ef156e86

No instances currently registered to LoadBalancer

It takes the same parameters as the register command.

HTTP vs HTTPS

There is not any information on the behavior between HTTP and HTTPS connections available yet.  But I can tell you what I have experienced with my limited tests.

When using HTTP (protocol=http) it appears to not have any session stickiness.  I loaded two web servers with a Default.htm file.  Each file specified which web server I was hitting.  When I repeatedly refreshed the page it bounced back and forth between the two servers pretty consistently.

When using HTTPS (protocol=tcp) the session was sticky.  In fact I could never get it to fail over to the other node.  When I pulled up the page on a different computer though it did pull up the other web server so I know that load balancing was working.

This is far from an extensive test.  I expect more detailed tests and hopefully Amazon themselves will provide specifics soon.

Instance Health Checks

A good load balancer needs a way to check that it’s nodes are online and traffic should still be routed to them.  Otherwise if a node failed the load balancer would continue to route traffic to them and would cause partial downtime for your site.

ELB checks a file that you specify on a schedule that you specify to determine instance health.  You configure this with the elb-configure-healthcheck command.  The parameters are:

<default> Name of Load Balancer instance to configure health checks on.
–target File to read
–interval How often to perform a health check
–timeout How long to allow the server to respond
–unhealthy-threshold How many consecutive failed checks before marking node as OutOfService
–healthy-threshold How many consecutive successful checks before marking node as InService

Here is an example of configuring health checks:

D:aws>elb-configure-healthcheck Test –target “HTTP:80/status.htm” –interval 5 –timeout 3 –unhealthy-threshold 2 –healthy-threshold 2

HEALTH-CHECK  HTTP:80/status  5  3  2  2

In this example we set the file http://<node IP address>:80/status.htm to be retrieved every 5 seconds.  We allow 3 seconds for the web server to respond.  If it fails to respond after 2 attempts we take the node out of service, if it responds successfully 2 times we put it back in service.

If we run the command elb-describe-instance-health before we configure health checks we will get the following output:

D:aws>elb-describe-instance-health Test

INSTANCE-ID i-ed156e84  InService

INSTANCE-ID i-ef156e86   InService

However once we enable the health checks we get the following output:

D:aws>elb-describe-instance-health Test

INSTANCE-ID i-ed156e84  OutOfService

INSTANCE-ID i-ef156e86   OutOfService

If we looked out our web server logs we would see that the load balancer tried to read the file status.htm and failed.  Once we put that file in place the nodes will go back to being InService.  This is important to note when adding this after you are in production.  You want to have your check file in place before you enable the monitoring.

You should also set that file to not be included in the log file, or you will have an entry in your logs every few seconds while the load balancer checks it’s health.  You should also leave the file blank since there is no reason to increase traffic load with irrelevant data.

Destroying ELB Instance

An ELB instance costs $18/month without even being in use.  Not a huge amount of money, but not something you want to be paying for if your not using it.

To delete an ELB instance you run the command elb-delete-lb:

D:aws>elb-delete-lb Test

Warning: Deleting a LoadBalancer can

lead to service disruption to any

customers connected to the LoadBalancer.

Are you sure you want to delete

this LoadBalancer? [Ny] y

OK-Deleting LoadBalancer

You may want to run elb-describe-lbs to confirm that you no longer have unnecessary ELB instances in place.

Remember if you delete an ELB instance you will not get the same DNS name when you recreate it.  So if you delete it you will have to update your CNAME records to reflect the changes.

Setting Up EC2 Command Line Tools on Windows

There are some great GUI tools for working with EC2 services such as ElasticFox and AWS Management Console.

And that’s just the tip of the iceberg.  However sometimes you need to use the command line tools because you want to script a task, or access features that a GUI tool doesn’t provide access to.  For example today I became motivated to finally get comfortable with the EC2 API so I could create a Elastic Load Balancer instance to test the new functionality provided.

I found lots of tutorials and guidance on setting up your Linux machine to run the tools.  Unfortunately Windows is a 2nd class citizen on AWS.  This is true pretty much across the board from command line tools, to Windows instances (just made available last October, still on Win2k3, etc).

So here is the “definitive guide” to setting up your Windows machine to run the EC2 API command line tools:

Install Java

The first requirement is to have Java 5 or later installed.  If you don’t already have Java installed for some reason go to http://www.java.com/en/download/manual.jsp#win.

Decide on AWS Root

Create a folder called AWS somewhere.  I like to make it easy to get to so I created it at d:aws.  You can really call this folder whatever you want, but it will be where you store your certificates, your services API files, etc.

Retrieve and Store AWS Certificates

Authentication to AWS happens via a certificate and private key.  You’ll need to retrieve these files from AWS.

Go to http://aws-portal.amazon.com/gp/aws/developer/account/index.html?action=access-key and then scroll down to the X.509 area.  You’ll need to create a new certificate.  Once you do they’ll provide you a Private Key File (pk-<random characters>.pem) and a Certificate (cert-<random characters>.pem).

KEEP THESE FILES PRIVATE.  Possession of these two files give you access to your AWS account.

Configure Environment Variables

Now you need to configure your command line environment with a few environment variables.  Create a batch file in d:aws called awsTools.bat.  Edit this file with the following text:

REM Path should have binjava.exe under it
set JAVA_HOME="C:Program Files (x86)javajre6"

REM Path to Primary Key and Certificate retrieved from AWS
set EC2_PRIVATE_KEY=d:awsaws-pk.pem
set EC2_CERT=d:awsaws-cer.pem

REM Path to EC2 API, subfolders of bin and lib
set EC2_HOME=d:awsec2
set PATH=%PATH%;%EC2_HOME%bin

REM Path to ELB API, subfolders of bin and lib
set AWS_ELB_HOME=D:awselb
set PATH=%PATH%;%AWS_ELB_HOME%bin

cls

cmd

On all of the paths be careful about not including a trailing slash.

JAVA_HOME will need to be set to the appropriate path for your machine.  If you’re confused about where exactly JAVA_HOME should point to find java.exe.  It will be a in a folder called bin.  You want to set JAVA_HOME to the parent directory of bin.

For example on my system you would find java.exe at “C:Program Files (x86)javajre6binjava.exe” so I set JAVA_HOME to “C:Program Files (x86)javajre6”

EC2_Private_Key and EC2_Cert both are the location of the private key and certificate that you retrieved from the AWS website in the previous step.  I renamed my key and certificate for simplicities sake.  If you have multiple AWS accounts all you need to do is modify these lines to switch between accounts.

EC2_HOME and AWS_ELB_HOME both point to the folders you unzipped the API into.  Both folders should have two subdirectories called bin and lib.  Bin will contain the cmd files of the different commands for that API.  You set the path variable to include these cmd files in your path so that you do not have to be in that directory to run them.

Now you only need to run the batch file to get a command line with the environmental variables set.  You also could permanently set these variables and have them available in any command window if you choose.  If you want to get fancy you could even put in the logic to set the paths based on the current directory of the batch file, and then put the folder on a thumb drive and carry it around.

Test Command

If you run awsTools.bat you should have a command prompt that you can run the EC2 tools from.  A simple command to test is “ec2-describe-regions”:

D:aws>ec2-describe-regions

REGION  eu-west-1     eu-west-1.ec2.amazonaws.com

REGION  us-east-1     us-east-1.ec2.amazonaws.com

If you get an error running this command then you need to go back and verify your installation.

Commands Documentation

I have found the Quick Reference Cards provided by Amazon to be extremely useful.  They can be found at http://aws.amazon.com/documentation/ for all the various services.

Amazon Releases Load Balancing

This morning Amazon announced the availability of a Load Balancing service for their EC2 cloud environment.  This was announced in conjunction with a monitoring service and a auto scaling service.

First of all their load balancing solution, Elastic Load Balancing, costs $0.025 per hour for each Elastic Load Balancer plus $0.008 per GB of data transferred through an Elastic Load Balancer.  Compared to my previously suggested solution of a Linux instance running HAproxy this is a significant cost savings.  The HAproxy scenario costs $27/month for a reserved instance or $72/month for on-demand.  An Elastic Load Balancer instance costs $18/month.

I’m working on a blog post on how to implement an Elastic Load Balancer solution.  Unfortunately right now the AWS tools do not include the functionality to work with ELB.  This means working with the command line whether you want to or not.  I will have a post up in the next couple of days if my schedule behaves.

Load Balancing IIS Web Farm on Amazon EC2

I was recently asked about load balancing IIS web farms in Amazon EC2.  In a traditional web farm (meaning hardware in a datacenter) you would do one of the following:

  • Windows Load Balancing Service (WLBS)
  • Hardware Load Balancer (F5, CoyotePoint, etc)

Windows Load Balancing Service is nice because it does not require any extra equipment to function.  The members of the IIS farm share a virtual IP address and they sort out between themselves where to route the incoming requests to.

Some people prefer a hardware load balancer over WLBS because it provides better performance.  However Microsoft uses WLBS for Microsoft.com so I question how relevant that argument is.

Other Cloud Providers

Before we talk about what to do in EC2 for I wanted to point out how other cloud offerings provide a load balancing solution.

GoGrid provides you access to F5 load balancers.  Through their web based management console you can create a load balancer, list the IP’s of your web servers to load balance, and they handle it all for you.  The best part is that the load balancing offering is 100% free.

Microsoft Azure and Google App Engine abstract away the load balancing issues.  One of the biggest benefits of these types of cloud offerings is that in exchange for giving up some control you get some simplicity.

EC2 and Load Balancing

When talking about EC2 and load balancing the first thing I should mention is that Amazon announced in October of 2008 that they will be offering a load balancing solution built into EC2 (http://aws.typepad.com/aws/2008/10/big-day-for-ec2.html).  I do not know when Amazon will actually release that functionality.  However when you go to the AWS Management Console the “Coming Soon” sidebar lists Tagging and then Monitoring, Load Balancing, and Auto-Scaling.  It seems logical that you’ll need to be able to group your server instances before they can provide load balancing.  So my guess is that load balancing will come after tagging. Assuming Amazon provides a useable and cost effective solution then all of our issues on this subject go away.

Why WLBS Doesn’t Work for EC2

WLBS Diagram

If your looking to use the Windows Load Balancing Service (WLBS) with EC2 you are out of luck.  The easiest way to explain why is to show how WLBS works.

Here you can see that each node of the web farm has it’s own IP address.  These machines are directly accessible via their own address’s.  They also share a virtual IP address (10.0.0.1 in the example).  The DNS entry for the website (www.website.com) points to the shared IP address, so the users incoming requests are received by the virtual IP address, and WLBS directs those TCP packets to one of the web servers.

The problem with EC2 is that the public Elastic IP’s are only able to associate with a single server instance.  As you can see from how WLBS works all of the servers in the farm need to be able to receive packets destined for the virtual IP address.

Alternatives to WLBS

HAProxy Diagram

Alternatives to using WLBS on EC2 involve quite a bit of work and some money, so Amazon providing the service in the future is very attractive.

The most common way to load balance EC2 instances is to have a front-end Linux server running HAProxy.  The problem is you’ll need a separate Linux instance in addition to your web farm.  A small Linux instance is $27/month for a reserved instance or $72/month for on-demand.  Not a deal breaker, but an added expense.  And since you’ll need a second load balancer to monitor the availability of the primary load balancer you’ll need to double those prices.

I’ll post detailed instructions on how to configure this setup in the future.  The basic steps are:

  1. Bring up instance of a Linux server using an ElasticIP
  2. Install haproxy by running: “apt-get install haproxy”
  3. Modify /etc/default/haproxy; change setting ENABLED to 1
  4. Modify /etc/haproxy.cfg to something along the lines of this file.
  5. Restart the haproxy service (service haproxy restart)
  6. Include a file in the root of each webserver called check.txt. If the load balancer can not read this file it will not direct clients to that web server.
  7. Configure your DNS records to point to the ElasticIP address from step 1.

This will get you a non-failover load balancer solution in place.  When I write a detailed post on configuring the load balancer I’ll include instructions on doing the failover steps.

Conclusion

The more I delve into cloud technologies the more I like solutions like Azure or Google App Engine.  We just touched on load balancing, and I barely scratched the surface and it’s already a fairly complicated issue.  You still have to solve the issues of patching, scaling up and down, monitoring, etc, etc.  The Azure/Google App Engine model takes all those issues away from you (or at least greatly simplifies).

However the toolsets and features of Amazon Web Services continues to improve.  For example if Amazon release a good load balancing solution that will be a huge win.  Amazon also gives you a lot more flexibility.  You’re not limited to using programming languages supported by other solutions.  You can treat Amazon almost like a traditional data center, install 3rd party software, etc, etc.  You also can arguably migrate from EC2 to your own hardware more easily than you can from other cloud solutions.

Ultimately where you decide to host your application involves a lot of factors.  Hopefully your load balancing options (as well as seeing the pain involved) on EC2 helps you make the decision that is right for you.