One of the nice things about our cloud platform (Storm On Demand / Smart Servers) is the ability to easily provision load balancing between nodes. This can be used for a multitude of purposes, from increasing overall performance, to mitigating downtime due to server problems.
However, if you're considering going with a load balanced setup, there are a few key issues to be cognizant of.
More below the fold...
One thing to keep in mind is that you can't really "load balance" databases. You could do "master/master" replication, but that has it's own set of problems.
When dealing with a load balanced setup, 99.9999% of the time you're going to want a separate database back end. Fortunately, it's easy enough to do since you could create an instance strictly for MySQL (or Postgres, or whatever your particular poison is), and then private network the front and back end nodes.
Once that is done, it's just an issue of having the application point externally to the DB back end to start talking to it (make sure the back end can take external connections).
This is probably the easiest concern to address when looking at load balancing - on our cloud platform or otherwise.
The following examples are assuming a single site. Adding sites would require additional tweaking/sync of the Apache config itself.
The other key issue that needs to be addressed when it comes to load balancing is how static content (images, PHP scripts, CSS files, etc) are going to be synchronized across nodes.
Before I continue, I would like to point out one key point: With a traditional setup, we like to have our technical sales team sit down with customers on a conference call to discuss their needs. This way we can custom engineer a solution to fit those needs.
Most of the time, and depending on the operating system, this would either involve utilizing a SAN and OCFS2 (Linux), or utilizing DFS (Windows) for the storage of static data.
However, the above example solutions are managed services that we offer and support only for traditional dedicated solutions. At this time, we do not have any managed solutions for Storm/SmartServers as there is no connectivity to the SAN network.
So what's a cloud customer to do?
Well, there are a couple of ways to go about doing this, any of which fall under best effort support. Again - we can't officially support any of these. Now that my disclaimer is out there, lets move on.
rsync and cron
Depending on the circumstances, running rsync via cron is possibly the easiest way of maintaining data synchronization across load balanced nodes. Here is a really rough checklist for determining if this method will be feasible:
- Content does not update frequently - for example, users will not be uploading images or other content at continuous intervals.
- Content uploads/changes will be performed on a single node, to be propagated to the other nodes.
- Small number of nodes.
Basically, the rsync/cron option has some limitations. The big thing is that you are really relegated to point-to-multipoint communications. You could theoretically set up rsync on each node to push to each other, however that becomes a nightmare as you add more nodes. There is also the issue of run timing for rsync - I'm not sure how well things would work out if all nodes were blasting each other at the same time.
Then of course there is the timing issue - what happens if you update a file and need it pushed to the nodes prior to the rsync run being fired off? Kind of defeats the purpose of utilizing cron.
Don't like the rsync/cron option? Got frequent updates? Don't worry! There is hope (best effort support, but still hope).
Before I talk about what Gluster actually is, lets do a quick recap why rsync/cron won't work for you:
- You have a lot of nodes.
- Content additions/changes will be performed on the various nodes
- Content additions/changes are quite frequent, or even constant
So what is Gluster? Essentially it is a way to roll your own SAN. It's not too horrendously difficult to use (at least from my perspective) either.
Utilizing Gluster, you can create multiple "bricks" (storage nodes), which you can then mount on your load balanced nodes. This way when static content is updated, added, or deleted, the change would be essentially instantaneous across all the other nodes.
You might be thinking "now why not do this for everything?". Well, that's a legit question, and here is the answer: You would really want to create separate instances for the storage nodes. More nodes = more money. You could certainly use a single node to start, but ideally you would want to have at least two. This would allow for redundancy in the event one of the nodes went down. Basically, you could establish what amounts to network RAID1.
Installation/use of Gluster is beyond the scope of this article, but if there is interest, I could certainly do a write up on that as I get the time.
Load balancing on Storm/SmartServers is a bit tricky right now given there is no particular managed product for it. However, I hope that I've given some hope with the two ways I've talked about for static data replication and synchronization. But again, remember, either of these options would be considered best effort when it comes to support.
Don't hesitate to contact me if you have further questions in regards to a load balanced Storm or SmartServer setup at 1-800-580-4985Tweet