Modern cloud computing has been popularized by the likes of Google and Amazon. They take advantage of their robust infrastructure to power their very large hosting environments. Google, for its part, leverages low end hardware on a quick network backbone with a mixture of its own proprietary software and open source solutions. They spend the bare minimum on their computing hardware but buy it in large quantities. With this solution they are able to operate their high-traffic products like their search engine, their e-mail systems and their ad revenue services. Using cloud semantics, they are able to provide very much, while using hardware many other companies would refuse to touch.
Amazon has become a powerhouse in the realm of hosting and content delivery. They leverage open source solutions and add their own code in. Much of what they write is returned to the community through open source licensing. They use their cloud to host their leading Amazon stores as well as power their successful EC2 (Elastic Compute Cloud) services. These services have quickly become well known for the robust host solutions they provide and they are one of the top providers in the market now. Rather than offer specific products like Google, they provide access to the infrastructure through “instances” which allows their clients to make their own decisions about what they need. Customers are able to purchase VPS (Virtual Private Server) instances based on how much CPU, RAM and Hard Drive resources they need. Often CPU resources are sold on a “per core” basis.
This explains some of the things clouds are used for, but what exactly constitutes a cloud computing environment? Some administrators feel that since they have their servers virtualized with a product such as VMWare, that they are taking advantage of cloud computing. The same can be said for some administrators who work with grid computing technologies. While virtualization is a key piece of the cloud, it is not the entire solution. The same is said for grid computing. Grid computing takes advantage of the idea of harnessing the power of many machines together, but that idea alone is not cloud computing.
When a user is provided a virtualized machine in a cloud environment, that user has been authenticated via their own credentials and has been granted access to RAM, CPU and Hard Drive space that has been isolated for their use. This allows the system to provision resources for the user and no-one else. For hosts like Amazon or one of their leading competitors, Rackspace, this also allows for an efficient way to charge the client for their usage and monitor their statistics. The security of the authenticated sessions also lends this hosting method the reliability and isolation that is necessary for many businesses and organizations to trust the cloud for their critical tasks. In these environments, the Virtual Private Server has replaced the Dedicated Server as the standard in running hardware intensive software.
Users operating in a traditional virtualized environment are also accessing a virtual machine with a console that assigns a certain amount of CPU, RAM and Hard Drive space. The key difference is that the user is not isolated from the other virtual machines in the environment. The space is not self-provisioned and aside from logging in to the operating system being served, the user has not authenticated. Disregarding the security implications, this is also a big negative for large environments like Amazon. There is no built in system for provisioning and measuring usages for billing becomes much more complicated. In addition, a heavy user can impact other users more easily than with the authenticated cloud provisioning. The cloud instances would also provide easy to access statistics to locate heavy users and manage them accordingly.
Grid computing has been used for years. Many organizations that require a lot of power to process their data have been building computer grids to handle this load rather than designing and building super computers. They usually take advantage of many different machines with strong CPUs and lots of RAM. The hard drive space is often only the bare minimum to run the operating system chosen for the grid. They are built with the best network capabilities they can afford. This description does seem to fit the pattern of a cloud. Both technologies are taking advantage of many machines to perform work more efficiently than a few single (but powerful) machines. The important difference in this situation is in how they are used. In a typical grid environment a small number of users are taking advantage of the power of the many machines in the grid to perform the same task. In scientific environments this would be performing complex calculations. For a media studio it would be rendering the detailed computer generated graphics that are prevalent today.
The cloud is, in effect, performing the exact opposite task. It is harnessing the power of many machines to more efficiently serve many different users. Each user is provided only what he or she needs. Each user is performing his or her unique tasks. Rather than having the waste of a Domain Controller, a File Server, and a Database Server each on its own hardware, these services are granted only the resources needed to perform the assigned tasks. A Print Server would not need the processing power or storage of a Database server, but in an ordinary environment a database is a mission critical service. Many administrators would not want to risk operating anything else in the same server space as a database. This leaves the wasteful option of running a server just for the simple task of spooling print jobs. Cloud environments eliminate the need to mix and match which services can operate alongside others. Every server can have its own space!
There is more to cloud computing than just what is being offered by large corporations. Most companies can benefit from operating in their own cloud. Administrators can buy VPS space from Rackspace or Amazon, or they can build their own with their own hardware and infrastructure. As mentioned before, most of these big organizations take advantage of open source software. One of the most noteworthy is XCP (Xen Cloud Platform). This is what Citrix has based their popular cloud systems on. XCP is entirely open source but is maintained and offered free by Citrix itself. It was derived from Xen Server, and includes the Xen Hypervisor and all of the interfaces needed to build a cloud environment. A Hypervisor is a layer that goes between physical hardware and a software layer. This configuration allows the hypervisor to split the resources in to “virtual hardware”. The software will run on top as if it was installed on to a physical server. There is a command line tool included to manage virtual machines and the resource pools but many organizations have already written very good graphical interfaces to manage XCP. Citrix wrote a proprietary interface for Windows called XenCenter which manages XCP and Xen Server but it is part of the suite that isn’t included in the source or binaries for XCP. There is still a very competent open source clone called OpenXenManager which performs many of the tasks of XenCenter. Amazon chose Xen as the basis for the EC2 server virtualization package and Rackspace chose Xen and XenServer for their products. Both have implemented the technology successfully and leverage many different parts of the software.
Another platform worth consideration is Eucalyptus. It has an enterprise and an open source version, and runs on many different Linux distributions. One of the most popular distributions to run Eucalyptus on is Ubuntu. Ubuntu has been optimized for operation in cloud computing environments for several versions now. UEC (Ubuntu Enterprise Cloud) includes the Eucalyptus stack built right in (along with another stack called OpenStack). Eucalyptus itself is a very mature cloud platform, and has a wealth of developers working on it. It supports three different major virtualization technologies: VMWare, Xen and KVM hypervisors. One feature of Eucalyptus that is popular is the compatibility with Amazon’s EC2 and S3 environments. This allows administrators to operate their own private cloud and interface with the public Amazon cloud. Much of the infrastructure in the private cloud can be used to retain the peace of mind that many companies want, while allow the applications that are bandwidth intensive (for the internet) such as e-mail and web traffic to be configured on the public cloud to save local resources.
There are many reasons why an administrator would take advantage of a private cloud. Efficiency, redundancy and economic benefits are all attractive reasons. Often the common public has been confused about what cloud computing really is. Labeling Gmail, Google’s e-mail service, as cloud computing is correct in a sense. At the same time it retains a bit of a negative connotation in their minds. For many, the cloud means insecure data and leaving it in the hands of someone else. It is portrayed as throwing your data to the wind and allowing anonymous eyes to view your private correspondence. These fears are easily assuaged by explaining that a cloud can be as private as a traditional network and even more secure if configured correctly. They are still servers running software and transporting packets of data through network cables and fiber. They are simply an iteration of a redundant and stable network design.