Scaling Drupal (on the cheap)
Our young company uses drupal to support our most important web sites including this one. Our biggest site is Shoe Idiot which has about 800,000 unique URLs because its database includes over 650,000 different shoes. That is the biggest site but all of our sites are product focused and therefore have big fat databases.
Each of our drupal sites has its own database but shares a common code base. Some of the sites use some unique modules so they have their own modules directory located in their site specific directory structure, but most of the code is shared across all of the sites which includes:
- Shoe Idiot
- IT Leadership
- Styles for Living
- Styles for Bath
- Styles for Bedroom
- Styles for Children
- Styles for Kitchen
- Styles for Garden
- Styles for Living Room
- Styles for Pets
- Styles for Office
- Stylish Destinations
- My Healthy Lifestyle News
- Crash Land, LLC
Drupal has been a real blessing to making these sites do exactly what I had envisioned for them. The stock drupal is fine for a relatively small site but once you've grown to the point that you have thousands of nodes, drupal starts to really have a few problems especially with the core search module. This module was the first to go as we worked to scale our sites.
Apache Solr To The Rescue!
Setting up Solr to handle the searches on the site was rather simple. There is a Drupal module, Apache Solr Search Integration that takes care of making Drupal work properly with Solr. All you have to do is configure a Solr server first. This is also really easy to do on a Linux system using packages. Our setup uses one instance of Solr with multiple cores, one for each of the sites. Our directory structure looks like:
. |-- etc |-- lib | `-- jsp-2.1 |-- logs |-- solr | |-- bath | | |-- conf -> ../conf | | `-- data | | |-- index | | |-- spellchecker1 | | `-- spellchecker2 | |-- bedroom | | |-- conf -> ../conf | | `-- data | | |-- index | | |-- spellchecker1 | | `-- spellchecker2
As you can see there is a single web application in the directory "drupal" which then includes a subdirectory - solr which then has directories for each of the cores. Since all of the cores are based on Drupal using the same modules the conf directory is sym linked to one place. That way their configurations are guaranteed to be identical.
Gettin' All Fancy with Solr and Views
Once we got rid of the useless Drupal core search we turned our attention to understanding exactly what we could do with Solr. We were already making lots of use of the Views module so adding the Apache Solr Views module was a no
brainer. To really get the most out of the combination of the two we did have to upgrade from Views 2 to Views 3. Views 3 isn't quite ready yet and we have run into some quirks and bugs because we are running with the development version but the tiny bit of pain is worth the very sweet gains. How are we using it? Almost everywhere on the sites. For example, if you visit the Shoe Coupons page on Shoe Idiot you are using Solr's faceted search ability. That is what allows you to simply click on the name of a store to be able to limit the shoe coupons to only that particular store.
Next the combination of Views 3 and Solr allows us to create the pages for each individual shoe. These pages include all of the relevant information about a particular shoe including description and price. Solr + Views allows us to also show similar shoes and similar coupons. These are special views that when given the ID of a node use Solr to return similar items.
Easy Load Balancing with Pound
The next step we needed to take to provide more capacity for processing all of our data was to implement a way to spread the load across all of our servers. We currently use three to run our system. One for MySQL, one for Apache and one for Solr and memcached. To provide additional processing power and memory for some rather hefty httpd children, we implemented Pound and setup apache on both the MySQL server and the Solr server. This is not ideal. Those servers should be left to do their particular job but we needed more capacity because of the memory footprint our httpd children have. The apache server runs Pound and handles most of the web requests. It also NFS exports the web tree so that both the MySQL server and the Solr server can access the files. Only a small number of requests are sent to these two machines, simply enough to prevent the primary apache server from running out of memory under normal circumstances.
Installing Pound was as easy as yum install pound and then editing the sample /etc/pound.cfg to meet our needs. Since we run CentOS, I've added the CentOS testing repository to our yum configuration and this is where the Pound packages were found. I realize that I could have accomplished the same thing by employing Apache's proxy modules, however given the already large size of the httpds I didn't want to add anything else and I like that pound is a simple, small process that is distinct from our web server applications. Getting it running and sending connections to the other servers was really quite easy to do.
Nginx Small Fast and Perfect for Images
Another technique that we used to increase capacity was setting up the very small and fast Nginx server. We use Nginx for serving up static files like images. It is our psudo CDN. The URL http://images.crash-land.com/images/image-name.jpg are how the src looks within most of our images tags across all of our sites. This reduces the number of big fat apache processes that need to service requests and therefore increases our capacity. This was also rather simple to get into place by installing packages and then configuring the server.



Comments
Thanks for the great article. I am just about to do the same thing on an Audio Video ecommerce site.
Any gotchas on upgrading from Views 2.x to 3.x?
Post new comment