Latent Semantic Analysis (LSA) is a mathematical method that tries to bring out latent relationships within a collection of documents. Rather than looking at each document isolated from the others it looks at all the documents as a whole and the terms within them to identify relationships.

An example of LSA:
Using a search engine search for “sand“.

Documents are returned which do not contain the search term “sand” but contains terms like “beach”.

LSA has identified a latent relationship, “sand” is semantically close to “beach”.

There are some very good papers which describing LSA in detail:

This is an implementation of LSA in Python (2.4+). Thanks to scipy its rather simple!

Read the rest of this entry »

A vector space search involves converting documents into vectors. Each dimension within the vectors represents a term. If a document contains that term then the value within the vector is greater than zero.

Here is an implementation of Vector space searching using python (2.4+). Read the rest of this entry »

The event calculus planner used within my thesis was based on Dr. Murray Shanahan’s ASLDICN (Abductive SLD with Integrity constraints and proof by Negation) planner with compound action support. This planner is an adaptation from one published in one of Dr. Shanahan’s research papers

http://casbah.ee.ic.ac.uk/%7Empsha/planners.html

The original planner only supports the generation of a single plan. I needed to support conditional planning. I wanted the planner to generate multiple plans representing the different ways of reaching the goal. The problem was how to convert the planner to generate all possible plans. Importantly ensuring that this does not cause infinite looping and no redundant plan solutions are generated.

My version of the planner add the following features:

  • Conditional Planning
  • Impossible Predicate
  • Occured And NotOccured predicates

Read the rest of this entry »

Running Prolog as CGI

November 23rd, 2007

Prolog can be run as CGI by using a PHP wrapper script which invokes the Prolog engine from within PHP. Prolog can be invoked indicating Prolog files to load and goals to initially achieve once loaded.

Prolog Functioning As CGI

Executing the following in PHP can spawn a process which runs Prolog.

$cgiOutput = `sicstus --goal $goal. -l "$cgiPrologScriptToLoad"`;

This specific example is for Sicstus but most Prolog command lines have a similar format. Another possiblity is to setup Prolog as CGI, since any langauge can be CGI. I was running my code on a windows box and found it impossible for Prolog to direct the content to the command line and capture it for returning. If you’re going the unix route you may want to look at PiLLoWs guide.

For form postings you can catch the post in PHP or a scripting language and create a prolog formated file which is passed to the prolog script when invoked.

You may want to have Prolog maintain state. This can be achieved through using a database. The database that I have used is Berkeley DB which SICStus has built in support for.

Dynamic Getter/Setters for PHP

November 23rd, 2007

We use the magic __call method in PHP which is called on an object when a declared function is called on it but it does not exist. This behaviour allows us to have default getters/setters but if we want specific behaviour for a get/set we just have to add the function to the class and __call will no longer be used for that class attribute.

  1. class GetSetExample{
  2.  
  3. /**
  4. * Dynamic getters and setters than maintain getX and setX formati. They can be overwritten
  5. * if custom processing is needed
  6. *
  7. * @param string $method
  8. * @param array $arguments
  9. * @return mixed
  10. */
  11. function __call($method, $arguments) {
  12.  
  13. #Is this a get or a set
  14. $prefix = strtolower(substr($method, 0, 3));
  15.  
  16. #What is the get/set class attribute
  17. $property = substr($method, 3);
  18.  
  19. if (empty($prefix) || empty($property)) { #Did not match a get/set call
  20. throw New Exception("Calling a non get/set method that does not exist: $method");
  21. }
  22.  
  23. #Check if the get/set paramter exists within this class as an attribute
  24. $match=false;
  25. foreach($this as $class_var=>$class_var_value){
  26. if(strtolower($class_var) == strtolower($property)){
  27. $property=$class_var;
  28. $match=true;
  29. }
  30. }
  31.  
  32. #Get attribute
  33. if ($match && $prefix == "get" && (isset($this->$property) || is_null($this->$property)) {
  34. return $this->$property;
  35. }
  36.  
  37. #Set
  38. if ($match && $prefix == "set") {
  39. $this->$property = $arguments[0];
  40. }
  41. elseif (!$match && $prefix == "set"){
  42. throw new Exception("Setting a variable that does not exist: var:$property value: $arguments[0]");
  43. }
  44. else{
  45. throw new Exception("Calling a get/set method that does not exist: $property");
  46. }
  47. }
  48.  
  49. }

Funkload Build script

November 23rd, 2007

Funkload is an open source python based unit testing tool. It serves as a good tool for load testing. We can use it to create a unit test which simulates a user browsing through a site. To test load run two simultaneous instances of the unit test and so on scaling up the number of concurrent instances.

Offical Site: http://funkload.nuxeo.org/

I have written a python based Funkload build script which:

  • Builds the Funkload configuration for multiple sites
  • Uses wget to generate sample of pages for load testing
  • Runs load tests
  • Builds HTML documentation from test results.

Read the rest of this entry »

Download PDF Thesis http://www.doc.ic.ac.uk/teaching/projects/Distinguished04/JosephWilk.pdf

This project took the HTML form systems as a model and built a Workflow Management System that uses artificial intelligence planning methodologies and Event Calculus workflow specifications to try to overcome some of the problems of Workflow Management Systems. Logic, server side languages and planning all rolled into one.

iWFMS admin interface

The development of the Workflow Management System with AI uncovered interesting issues in modelling situations in the Event Calculus and the
problems that need to be overcome to use AI with workflow. The problems and
solutions developed in the project cover a wide spectrum of domains, looking at logic
programming, server-side languages and getting the two to talk to each other. Areas
covered include such interesting topics as typing of HTML to new frameworks for
Prolog running as CGI.

Achievements

  • Workflow specification language
    Using the Event Calculus and extensions to specify workflow.
  • [HTML form typing | HTML typing in Prolog]
    A typing engine for ensuring that the HTML form element specifications are
    correct when used in workflow specifications.
  • A Visualisation tool for Event Calculus plans
    A tool that generates Scalable Vector Graphic graphs for Event Calculus
    plans.
  • A HTML/PHP iWFMS engine
    Using the plans generated from the workflow specifications to support the
    running and management of a system.
  • A JavaScript plan execution engine
    Facilitates the following of workflow plans in a scripting language that runs
    while the user is viewing and interacting with a web page.
  • Logic programming running as Common Gateway Interface (CGI)
    A framework for the use of high-level declarative programming languages
    functioning as CGI.
  • [Logic programming and server-side language interaction model | Interaction support between PHP and Prolog]
    An Interaction model allowing server-side languages used for generating web
    pages to interact with logic programming languages.
  • A Hospital model working example
    An example of how the specification can be utilised for a real world scenario in
    a hospital. Providing the full functionality within the iWFMS to run and manage
    this system.

The task was to use the MSSQL database adapter (Zend_Db_Adapter_Pdo_Mssql) from the Zend-Framework and ensure it worked on both windows and Unix
platforms.

The PDO drivers were a little tricky. The main problem we found was that different drivers require the date to be inserted with different formats and the date that comes back from the db is in different formats.

Aside from having to deal with dates (which we handled as suggested by Bill Karwin) where we use Zend_Date and at the last point we convert it to a string date
http://framework.zend.com/issues/browse/ZF-181

And Limit function not working:
http://framework.zend.com/issues/browse/ZF-1037

We have had no other problems connecting to MSSQL 2005 and MSSQL 2002 SQL Server.

The drivers we use are:
Windows
DB-LIB (MS SQL, Sybase) 5.1.6.6
http://pecl4win.php.net/

Unix
http://pecl.php.net/package/PDO_DBLIB

Under Unix we use our own Zend_Db_Adapter_Pdo_Dblib which just extends Zend_Db_Adapter_Pdo_Mssql. We do this just to change the date format for insertion (we store the format required for a PDO adapter in each Zend_Db_Adapter_Pdo_* and use that when converting the date from a Zend_Date to a string).

As far as PHP/PDO is concerned -
Under Windows PHP runs pdo (mssql)
Under Unix PHP runs pdo (dblib)

I would be interested to hear what the performance is like using ODBC to talk to MSSQL Server. Looking at all the problems we had with dates and inconsistent drivers going the OBDC route does seem appealing just to get some consistency.

Keeping the Cache Hot

November 15th, 2007

Problem

The exipry of content within caching architectures is only identified when a user makes a request for expired data. Hence a % of the visitors to the site will not be able to take advantage of caching.Many different caching architectures are used within a typical dynamic site. Hence the solution needs to be cache agnostic.

Architecture

Emmao bot was the name given to the python program which is used to keep the cache hot. 

Figure 1: Emmaobot Server UML Model
Emmao bot

Solution

Emmao bot has been built to act as a user agent and request pages. mod_python is used to make the apache children log their requests in a special format. Emmao bot is running in the background as a daemon process. It can be run from the webhead or an alternative server. It examines the special apache log files and adds events for when these expiry. Lib event is used to manage these events. Pages have different rankings based on analysis as emmao runs. It uses this to ensure that the most important/popular/heavy pages never expiry. Also if there is a limit on the number of pages to focus on, rank can be used to decided which pages to ignore.

The Cost

Although the number of pages that emmao bit manages can be set to limit load on the webserver, there is still an increase in traffic due to Emmao Bot.
In live production environments with Emmao bot managing 10,000 pages I have not found the peformance outway the benfit of reducing maximum user fetch time.

Links

LibEvent http://monkey.org/~provos/libevent

ModPython http://www.modpython.org/

Squid and members

November 15th, 2007

Task

Use Squid to manage a cache for a website where there are member users (logged in to site) and public users. Squid must cache both member views of a page and public views.

Squid needs to check the authentication of the user and decided whether it should redirect them to a cache for members or for public users. There are only two discrete sets of users and any content that is specific to users if handled via AJAX.

  • Squid will be operating as a transparent proxy.
  • Usernames/Passwords are stored within a MSSQL database.
  • Squid is hosted on a unix box along with Apache

Read the rest of this entry »