Keeping the Cache Hot

15 Nov

Problem

The exipry of content within caching architectures is only identified when a user makes a request for expired data. Hence a % of the visitors to the site will not be able to take advantage of caching.Many different caching architectures are used within a typical dynamic site. Hence the solution needs to be cache agnostic.

Architecture

Emmao bot was the name given to the python program which is used to keep the cache hot. 

Figure 1: Emmaobot Server UML Model
Emmao bot

Solution

Emmao bot has been built to act as a user agent and request pages. mod_python is used to make the apache children log their requests in a special format. Emmao bot is running in the background as a daemon process. It can be run from the webhead or an alternative server. It examines the special apache log files and adds events for when these expiry. Lib event is used to manage these events. Pages have different rankings based on analysis as emmao runs. It uses this to ensure that the most important/popular/heavy pages never expiry. Also if there is a limit on the number of pages to focus on, rank can be used to decided which pages to ignore.

The Cost

Although the number of pages that emmao bit manages can be set to limit load on the webserver, there is still an increase in traffic due to Emmao Bot.
In live production environments with Emmao bot managing 10,000 pages I have not found the peformance outway the benfit of reducing maximum user fetch time.

Links

LibEvent http://monkey.org/~provos/libevent

ModPython http://www.modpython.org/

blog comments powered by Disqus