A friend recommended AppScale as a platform for a software project. It took me some time to wrap my head around what AppScale was. Here’s my attempt to package that information in a way that would have saved me some hours, starting with some background on the problem that it solves.
What is Platform As A Service (PaaS) ?
Web developers have traditionally had to wear a system administrator hat to get their stuff live. They write their web sites in PHP or Python or Ruby or whatever, but then they have to set up infrastructure. The classic example of infrastructure would be a couple web servers, a database server, and a load balancer.
The traditional way to set all that stuff up is to buy servers, configure the software, and then plug them into the internet. For many institutions, a superior option is “Infrastructure as a Service” (IaaS); rent managed servers (or virtual servers.) This has advantages like spreading the cost out over a longer time, avoiding physical hardware setup, and hosting concerns such as physical security and electrical / network failover. You still have to do some software configuration, though (web server software, database software, load balancer settings.)
Examples of IaaS offerings are:
- Google’s “Google Compute Engine”
- Amazon’s “EC2”
PaaS takes managed hosting one step further. Instead of personally configuring the infrastructure software, you add a config file to your code that says something like “gimme a MySQL database and a cloud of 1-10 web servers that use index.php to handle all web requests.” You upload your code and config file to the PaaS system, which does the grunt work to set up the database server, web servers, and load balancer, and gets them all talking to each other, and configures the growth / shrinkage of that web cluster for you. This means you’re still specifying what the infrastructure should be, but it’s a much higher-level process, and less time-consuming.
Examples of PaaS offerings are:
- Google’s “Google App Engine”
- Amazon’s “Elastic Beanstalk”
To sum up: first there was running servers yourself. IaaS abstracts that so that to get you out of configuring hardware. PaaS adds another layer of abstraction to get you out of configuring software.
Why PaaS is Scary
PaaS solutions are generally just a layer on top of IaaS solutions, and IaaS solutions are not standardized. So if you are hosted on Google, you’re probably storing files on their proprietary “Cloud Storage”. If you’re hosted on Amazon, you’re probably storing files on their proprietary “Simple Storage Service” (S3). Either way, if you have to change hosting vendors, or if a client requires your app be installed on their premises, you’re out of luck. You’re going to have to reprogram the parts of your application that use proprietary components.
The need to change hosting vendors can come up unexpectedly, even for a web developer who is an expert at traditional hosting. The high level of abstraction to build PaaS on top of IaaS can create surprises as operations scale up. Problems that I have seen reported on the internet include:
- Wait, so we get charged per database query?
- What do you mean my script is terminated if it runs for more than 60 seconds?
- They changed the pricing model? Our design is going to be too expensive now!
One can choose to blame the developer or the host for stuff like this, but regardless of fault some times it makes sense to move to another host.
Google’s Dev Environment
Google provides an App Engine development environment. It’s an abstraction layer that allows software written with App Engine API calls to proprietary stuff (like Google Cloud Storage) to work on a plain ol’ computer, outside the cloud. This is nice for developers because we can build and troubleshoot locally without spinning up & paying for hosted resources. It’s not meant to be a production solution, tho.
AppScale to the Rescue!
AppScale is an open source implementation of Google App Engine. You load it onto your machine a lot like the Google App Engine dev environment. You can then run the software you wrote for App Engine locally on your computer. Or you can provide a slightly different command and deploy the software to Google. Or you can vary that command a bit further to deploy the software to Amazon. So it’s a “get out of jail free” card (or “get out of Google free”.)
You don’t really write applications to “run on AppScale”. You write them to run on Google App Engine, and AppScale does some abstraction magic to make it work anywhere you designate. The deployment target can be Google, Amazon, or a server or cloud of servers that you store in your closet. Of course, the autoscaling features only work on Google or Amazon. AppScale can’t add another server to your closet.
Replacing Google Costs Some RAM
AppScale needs 4GB of RAM. That’s beyond the “free” tier of Google or Amazon. My desktop has 4GB of RAM, so when I try to run AppScale in addition to an OS, the computer slows to an unuseable crawl. I don’t know what component(s) of AppScale have this requirement. I’m hoping it’s only the master node that needs this, because if all servers launched by AppScale need 4GB plus room for everything else, that could be a very expensive cluster of web servers. Maybe it’s configurable? More investigation is warranted.
If you want to grok AppScale, don’t start at the AppScale web site! Start by playing with Google App Engine. Doing their Python tutorial was a breeze. When I tried the PHP tutorial, I was quickly bogged down in installing software from source and troubleshooting compiler dependency problems.
Once you feel like you know what App Engine is all about, THEN play with AppScale. But don’t try to follow the instructions on the AppScale web site; those look impressive from a marketing perspective, but there are missing steps. Instead go to https://github.com/AppScale/appscale/wiki/AppScale-on-VirtualBox . That tutorial directs you to install a virtual machine that already has AppScale configured. Once installed, you can SSH into the machine to edit code, and you can point your browser at the machine to see your app or a nice administrative interface that shows you how busy the app is. In this configuration, app scale runs all services (web, database, etc) on a single server, and does no autoscaling. But remember: don’t even try this unless you have well over 4GB RAM. If you don’t you’ll have to pay Amazon or Google a few bucks to rent one of their servers and try it there.