Funkload Build script

23 Nov

Funkload is an open source python based unit testing tool. It serves as a good tool for load testing. We can use it to create a unit test which simulates a user browsing through a site. To test load run two simultaneous instances of the unit test and so on scaling up the number of concurrent instances.

Offical Site: http://funkload.nuxeo.org/

I have written a python based Funkload build script which:

  • Builds the Funkload configuration for multiple sites
  • Uses wget to generate sample of pages for load testing
  • Runs load tests
  • Builds HTML documentation from test results.

(more…)

Intelligent workflow management System

22 Nov

Download PDF Thesis http://www.doc.ic.ac.uk/teaching/projects/Distinguished04/JosephWilk.pdf

This project took the HTML form systems as a model and built a Workflow Management System that uses artificial intelligence planning methodologies and Event Calculus workflow specifications to try to overcome some of the problems of Workflow Management Systems. Logic, server side languages and planning all rolled into one.

iWFMS admin interface

The development of the Workflow Management System with AI uncovered interesting issues in modelling situations in the Event Calculus and the
problems that need to be overcome to use AI with workflow. The problems and
solutions developed in the project cover a wide spectrum of domains, looking at logic
programming, server-side languages and getting the two to talk to each other. Areas
covered include such interesting topics as typing of HTML to new frameworks for
Prolog running as CGI.

Achievements

  • Workflow specification language
    Using the Event Calculus and extensions to specify workflow.
  • [HTML form typing | HTML typing in Prolog]
    A typing engine for ensuring that the HTML form element specifications are
    correct when used in workflow specifications.
  • A Visualisation tool for Event Calculus plans
    A tool that generates Scalable Vector Graphic graphs for Event Calculus
    plans.
  • A HTML/PHP iWFMS engine
    Using the plans generated from the workflow specifications to support the
    running and management of a system.
  • A JavaScript plan execution engine
    Facilitates the following of workflow plans in a scripting language that runs
    while the user is viewing and interacting with a web page.
  • Logic programming running as Common Gateway Interface (CGI)
    A framework for the use of high-level declarative programming languages
    functioning as CGI.
  • [Logic programming and server-side language interaction model | Interaction support between PHP and Prolog]
    An Interaction model allowing server-side languages used for generating web
    pages to interact with logic programming languages.
  • A Hospital model working example
    An example of how the specification can be utilised for a real world scenario in
    a hospital. Providing the full functionality within the iWFMS to run and manage
    this system.

PDO & Zend Framework Playing nicely with MSSQL

15 Nov

The task was to use the MSSQL database adapter (Zend_Db_Adapter_Pdo_Mssql) from the Zend-Framework and ensure it worked on both windows and Unix
platforms.

The PDO drivers were a little tricky. The main problem we found was that different drivers require the date to be inserted with different formats and the date that comes back from the db is in different formats.

Aside from having to deal with dates (which we handled as suggested by Bill Karwin) where we use Zend_Date and at the last point we convert it to a string date
http://framework.zend.com/issues/browse/ZF-181

And Limit function not working:
http://framework.zend.com/issues/browse/ZF-1037

We have had no other problems connecting to MSSQL 2005 and MSSQL 2002 SQL Server.

The drivers we use are:
Windows
DB-LIB (MS SQL, Sybase) 5.1.6.6
http://pecl4win.php.net/

Unix
http://pecl.php.net/package/PDO_DBLIB

Under Unix we use our own Zend_Db_Adapter_Pdo_Dblib which just extends Zend_Db_Adapter_Pdo_Mssql. We do this just to change the date format for insertion (we store the format required for a PDO adapter in each Zend_Db_Adapter_Pdo_* and use that when converting the date from a Zend_Date to a string).

As far as PHP/PDO is concerned -
Under Windows PHP runs pdo (mssql)
Under Unix PHP runs pdo (dblib)

I would be interested to hear what the performance is like using ODBC to talk to MSSQL Server. Looking at all the problems we had with dates and inconsistent drivers going the OBDC route does seem appealing just to get some consistency.

Keeping the Cache Hot

15 Nov

Problem

The exipry of content within caching architectures is only identified when a user makes a request for expired data. Hence a % of the visitors to the site will not be able to take advantage of caching.Many different caching architectures are used within a typical dynamic site. Hence the solution needs to be cache agnostic.

Architecture

Emmao bot was the name given to the python program which is used to keep the cache hot. 

Figure 1: Emmaobot Server UML Model
Emmao bot

Solution

Emmao bot has been built to act as a user agent and request pages. mod_python is used to make the apache children log their requests in a special format. Emmao bot is running in the background as a daemon process. It can be run from the webhead or an alternative server. It examines the special apache log files and adds events for when these expiry. Lib event is used to manage these events. Pages have different rankings based on analysis as emmao runs. It uses this to ensure that the most important/popular/heavy pages never expiry. Also if there is a limit on the number of pages to focus on, rank can be used to decided which pages to ignore.

The Cost

Although the number of pages that emmao bit manages can be set to limit load on the webserver, there is still an increase in traffic due to Emmao Bot.
In live production environments with Emmao bot managing 10,000 pages I have not found the peformance outway the benfit of reducing maximum user fetch time.

Links

LibEvent http://monkey.org/~provos/libevent

ModPython http://www.modpython.org/

Squid and members

15 Nov

Task

Use Squid to manage a cache for a website where there are member users (logged in to site) and public users. Squid must cache both member views of a page and public views.

Squid needs to check the authentication of the user and decided whether it should redirect them to a cache for members or for public users. There are only two discrete sets of users and any content that is specific to users if handled via AJAX.

  • Squid will be operating as a transparent proxy.
  • Usernames/Passwords are stored within a MSSQL database.
  • Squid is hosted on a unix box along with Apache

(more…)

OpenId

11 Nov

OpenID is an open loosely distributed single sign on protocol. It looks at why Microsoft’s single sign on has not taken off on a large scale. Concluding that no-one wants a single company storing all details, hence create a distributed single sign-on protocol.

OpenIDs take the form of URLS:

exampleuser.livejournal.com

OpenID 1.1 Protocol Summary

OpenID specifications |http://openid.net/specs.bml

The openid protocol 1.1 specification in summary.

  • Identify the Identify Provider associated with openid submitted by the End User.
  • Agree a shared key between the Consumer and Identify Provider.
  • Redirect the End User to the Identify Provider to authenticate themselves with a password.
  • End User gets redirected back to Consumer with authentication data signed by the shared key.

(more…)

Curl and Certificates with Windows PHP

6 Nov

Curl on a Windows PHP installation does not know where to look for certificates. Hence when you try and curl a https url it fails. The default value for CURLOPT_SSL_VERIFYPEER is true which means curl will always try and validate ssl by default. I discovered this while working with an OpenID library (v1.2.3):
http://openidenabled.com/php-openid/

There is the option of disabling the verfication.


$ch=curl_init;
// set URL and other appropriate options
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false);

But thats ignoring the problem and opening a security hole! Instead download a reputable Certificate bundle file, for example:

http://curl.haxx.se/docs/caextract.html

Then set CURLOPT_CAINFO with the location of your certificate bundle.


if( strtoupper (substr(PHP_OS, 0,3)) == 'WIN' ) {
curl_setopt($c, CURLOPT_CAINFO, 'C:/certificates/cacert.pem');
}

Automatic Tag Generation

22 Oct

This project looked at dynamically generating suggestion tags for content. To simplify the task some constraints where introduced.

  • The content which will be tagged is news articles with HTML markup.
  • Only English content.

I used the following HTML page to experiment on with suggestion tags: http://news.bbc.co.uk/1/hi/entertainment/6624223.stm

To help evaluate the tagging methods I asked a sample of people to suggest what they thought the best tags would be. They came up with:

paris, hilton, paris hilton, jail, jail sentence, drink-driving

(more…)