Working With Memcached in WordPress
In the early 2000s, LiveJournal dominated the blogging world. While known as a pioneer in the world of online communities, many may not be aware that its creators are also responsible for one of the most important caching technologies currently powering the web: memcached (pronounced “mem-cache-dee”). Memcached is the caching engine behind Facebook, Twitter, and, a favorite at 10up, WordPress.com. Even though memcached is a stable and mature caching system, it has subtle nuances that can make it difficult to tame. Given that our work at 10up frequently involves development within memcached environments, we have become quite familiar with the ins and outs of the tool. In this article, I share some of my insights, cautions and thoughts on developing in a memcached environment.
A Note on Memcached in WordPress
The official Memcached website offers the best definition for what memcached is:
Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.
By storing frequently accessed data in memory as opposed to the disk or database, memcached offers faster access to data and less strain on the database. Out of the box, WordPress does not use memcached. In order to take advantage of memcached, you need to install the necessary dependencies on your server, as well as a caching add-on that supports Memcached, such as the Memcached Object Cache drop-in. The Memcached Object Cache drop-in causes WordPress to load a memcached compatible version of the WP_Object_Cache
class instead of the built in version of the class that only offers run time object caching. Once these elements are in place, WordPress will use memcached to store objects.
It is worth noting that the nomenclature in the memcached world can be very confusing. The “in-memory data storage daemon” is known as memcached. The term “memcache” (without a “d”), tends to refer to the original memcached PECL extension that is the more widely spread and used memcached libary. To further confuse the point, Digg released a PECL extension called “memcached” in an effort to take advantage of more of the memcached features (e.g., multiget, multiset, and check and set). The WordPress Memcached Object Cache uses the memcache (no “d”) extension. Scott Taylor has recently begun work on a new Memcached Object Cache that uses the memcached (+d) PECL extension (note that this is VERY beta at the time of this writing). For a comparison of the two PECL extensions, please see Brian Moon’s article on the differences between the two extensions, as well as the PHP Client Comparison table on Google Code.
Assume That Cache Misses will Occur
Memcached is not perfect, nor is it meant to be. In a highly distributed environment with numerous memcached instances, it is likely when querying for a value stored in memcached, the queried key may not be located. This can occur due to a number of reasons, but is primarily due to one of the following reasons: the value is expired, evicted, or not found on the instance queried.
When adding or setting values in memcached, you can optionally set an expiration time. If that time is passed when you query for the key, the key will not be found. Interestingly, the key/value pair may still exist in the cache, but because the expiration time has passed, no value will be returned. The data may also have been evicted from the cache. Memcached is a Least Recently Used (LRU) cache which attempts to fight “cache pollution” by removing objects from the object that have not been used recently. This allows for more space in the cache for items that are “hotter” or used more frequently. As such, when requesting an object from cache, it may not be found due to an eviction by the LRU algorithm. Finally, while memcached attempts to get the key from the memcached instance it was stored on, it does not always succeed (e.g., server outage, issue with determining which machine stored the key). In such situations, the value will not be located.
All that detail is to say that memcached environments are built to produce cache misses and it is the job of the developer to prepare for that event. It is important that when getting data from memcached, a check is performed to verify that good data has been received, and in the event that it has not, a “fallback” plan is initiated. I briefly discussed some fallback plans during my WordCamp San Diego 2012 talk. These cache misses occur more frequently than one would expect, so it is important to plan for such a scenario.
Groups Are not a Memcached Concept
In WordPress, the wp_cache_* functions accept not only a “key” arugment, but also an optional “group” argument. The value of the group argument is prepended to the key when the object is stored in memcached. For example, if you were to assign an object to cache with wp_cache_add( 'most-viewed', $most_viewed_articles, 'top-posts' );
for a site with $blog_id of 4293, the actual key would be “4293:top-posts:most-viewed”. At best, the group argument provides namespacing for the key. It is not a core memcached feature and is not stored any differently in the cache based on the group value.
This can be misleading because it suggests that if a value can be stored by group membership, the whole group can be easily invalidated. Unfortunately, this is not the case (note that Scott Taylor has an excellent plugin, Johnny Cache, that can invalidate by group). My assumption is that this group argument was added to the WordPress memcached backend for compatibility with WordPress’ WP_Object_Cache
class for run time caching that does support grouping cached values. All that said, you can leverage an “incrementor” as part of a key to invalidate large parts of the cache, but that is very different than invalidating by group.
Invalidation is Difficult
When developing an application that uses a cache (regardless of type or implementation), determining how the cache will be invalidated can be particularly tricky. Given that memcached does not offer flushing of groups, designing the invalidation routines for the caching layer presents some challenges. As such, the first thing I consider when working with caching data is how I will eventually invalidate that data. If I can determine what I will need to do to invalidate the data, it often simplifies my strategy for generating that data in the first place. When reviewing code and reading about caching, it seems that the data invalidation is often an after thought. After a system for caching data is built, it can be difficult to figure out how to invalidate the data and therefore it gets neglected.
I have found that asking myself two questions can help with this problem: 1) When should the data be generated?, 2) When should the data be invalidated? By doing so, I tend to orient myself more to the question of how to “refresh” the data as opposed to how to “generate” the data in the first place. There is a subtle, but meaningful difference in that statement. When I think of “refreshing” the data, I consider how I transition from one version of the data to the next version of the data. If I think about “generating” the data, I tend to only think about obtaining the data. I neglect the importance of the shift between sets of data. This has recently led me to release a plugin, A Fresher Cache, that allows for easily calling functions that cause this data to be “freshened”. This tool is a major time saver when developing cached components of application and it encourages and rewards me for thinking through the how to invalidate and refresh caches.
Locking is Nearly Impossible
In the event that you are using a distributed memcached environment, it is likely because you are dealing with a high traffic site. Inherent in working with high traffic sites is dealing with race conditions. A race condition, or the “stampeding herd” problem, occurs when two different requests compete for a shared resource. This is typically more of an issue for developers dealing with multi-threaded applications, but the general principle applies for web developers dealing with large amounts of traffic. For example, imagine that you have an expensive query that generates a complex data table. This table is generated using external HTTP requests, as well as intensive database queries. The data, once generated, is cached in memcached; however, as mentioned before, you need to prepare for it magically being evicted from the cache. If the value is not in the cache, you may write your application to regenerate the data on the fly. The problem with this is that in high concurrency environments, multiple requests that generate the same data may occur. This can increase the load on the server and cause performance issues for the site, or worse yet, bring it crashing down.
One technique for dealing with the stampeding herd is to use a lock. A lock signals to the application that the data is being generated and “locks” the application from trying to generate that data again with a future request. The memcached docs refer to this as the “ghetto locking” method for stopping the stampeding herd. The issue with high concurrency is that locks in a distributed memcached environment may not be set quick enough in order for the application to be aware that the lock is set and prevent future requests for the data. In a single memcached server environment this is not an issue. When you scale to many memcached servers it immediately becomes an issue because the requests may come in so quickly that the network of memcached machines is not aware of the lock.
The best defense against this issue is to attempt to generate data on scheduled events or admin events (i.e., situations where concurrency is unlikely) and program fallbacks that are less intensive than the initial data request (I have argued for storing a backup copy of the data in some situations elsewhere). Another way to handle the issue is to create a lock that uses a non-memcached resource, such as a MySQL database. MySQL has support for locking and if the lock is set and read from a single database server in a multi-database server environment, the lock should be stable. I would recommend attempting to develop the application without a need for locking first and only deal with locking issues if absolutely necessary.
1 MB Object Size Limit
It is important to note that memcached objects are limited to 1 MB in size. Additionally, object keys are limited to 250 bytes. It becomes really important to understand these limits especially when developing using the transient functions in WordPress. Since the transient functions use WordPress’s object cache if it has been enabled (which is where memcached plugs in), storing data or keys that exceed this limit can cause very difficult to debug issues. Transients that are stored in the database have an approximately 4GB size limit and a 64 character key length limit (although 19 character are reserved for the “_transient_timeout_” prefix reducing the key length to 45 characters). As a result, I recommend always thinking about caching objects in WordPress with a 1 MB size limit and a 45 character key limit regardless of whether the transient or wp_cache_* functions are being used. This should guarantee maximum compatibility amongst systems.
Memcached, in all its power and glory, can be a pest at times. I hope by explaining some of the issues I have encountered and some of the strategies to avoid these pitfalls, you can avoid some troubles in the near future!
Comments
Gabriel Koen on
Awesome post. I love (talking about) caching.
> By storing frequently accessed data in memory as opposed to the disk or database
I think it’s also worth pointing out that it’s useful to cache calls to 3rd parties, but that could be an article in itself. :)
> My assumption is that this group argument was added to the WordPress memcached backend for compatibility with WordPress’ WP_Object_Cache class for run time caching that does support grouping cached values
There’s also another benefit to “cache groups” – simpler/easier to understand symantics. For example, with appropriate grouping I can have a “top-posts” cache in the “widget” group and in the “post” group – in both cases they are “top posts” but with different data, and the grouping takes care of the logical difference between the two. But it’s symantics.
Another use is an edge case trick, if you have a plugin with a custom group you can invalidate all the caches within that group by changing the group name. Useful for blowing out everything in a group when the data changes, or in a schedule if you have a dynamic group name, without invalidating your entire cache.
> It becomes really important to understand these limits especially when developing using the transient functions in WordPress.
Since you mention transients, it’s probably worth mentioning your view on the conceptual difference between using transients vs WP cache. That’s also a pretty expansive topic unto itself, and it was recently discussed at length at the WordPress VIP workshop. The takeaway from that discussion, which I agree with, is that transients are tricky due to the built in expiry. When writing a public plugin (for the .org repo for example) it’s often better to use transients because you know it will be stored somewhere between requests, whereas you don’t always know if WP cache is using a persistent store like memcached. But you have to balance that: transient expiries with memcached are tricky since you have multiple points when the data can expire (cache timeout, transient TTL, cache eviction) and it’s mildly overloading their original purpose (guaranteed store for data that changes regularly, typically 3rd party data like feeds.
Zack on
Glen,
Thanks for your comments! Great points all around! As a general comment on all of your points, yes, all of these could be very expansive articles in and of themselves. My hope was to give people who are working in a memcached environment some pointers given the types of things we typically deal with.
Regarding groups, I would love if it was above and beyond semantics and something that would baked right into memcached. It would be so awesome to be able to invalidate a whole group at once with something like a “delete_group” method. The only problem I have with invalidating a group by changing the group name is that you then lose the “semantic” value ;) A similar approach that I will often use is putting a changeable slug in the cache key (it’s essentially the same concept) that is updated to invalidate the group of cached values. You can see an example of it here: https://gist.github.com/2864688. Both approaches work really well and are important to think through early in the development stage.
Regarding the transients, you are absolutely correct…very tricky. Transients are best for distributed code because it’ll take advantage of whatever the WP install has available (object cache, database). The problem is that the storage requirements (key length and value size) for memcached and the wp_options table are different, so I try to usually use the “least common denominator”, 1MB for the value, 45 characters for the key. It helps me avoid some very difficult to debug issues. It is also a very different beast dealing with evictions with the object cache than it is with transients in that object cached values evicted unpredictably.
Thanks for your comments!
Zack on
Sorry…about the “Glen” above…was too excited about your comment that I seemed to combine Gabriel Koen into Glen. Thanks Gabriel!
The Frosty on
Looking into this, curious if it’s necessary to create a cached object in WP when doing a query to invoke best cache practices or something like W3 Total cache can help with database queries with memecached.
Yiftach on
AFAIK, since version 1.4.14, the max size of a Memcached object is 500MB.
See more details here:
https://groups.google.com/forum/?fromgroups=#!topic/memcached/MOfjAseECrU
Jake Goldman on
Good heads up, though most configurations will probably still err on the side of smaller sizes.