Latent Semantic Analysis in Ruby

November 21st, 2008

I’ve had lots of requests for a Ruby version to follow up my Latent Semantic Analysis in Python article. So I’ve rewritten the code and article for Ruby. I wrote LSA from scratch this time and test driven so it has some subtle differences from the Python version.

What is LSA?

Latent Semantic Analysis (LSA) is a mathematical method that tries to bring out latent relationships within a collection of documents. Rather than looking at each document isolated from the others it looks at all the documents as a whole and the terms within them to identify relationships.

An example of LSA:
Using a search engine search for “ruby“.

Documents are returned which do not contain the search term “ruby” but contains terms like “rails“.

LSA has identified a latent relationship, “ruby” is semantically close to “rails“.

How does it work?

Given a set of word documents, each word in those documents represents a point in the semantic space. LSA uses a mathematical technique called Singular value decomposition to take the documents/words represented as a matrix and produce a reduced approximation of this matrix. In doing this it reduces the overall noise in the semantic space bringing words together. Hence after applying LSA some words share similar points in the semantic space, they are semantically similar.

These groups of semantically similar words form concepts and those concepts in turn relate to documents.

Term a < ----------->
Term b < -----------> Concept d < ---------> Document e
Term c < ----------->

Read the rest of this entry »

We can think of JavaScript running within a clients browser as a robotic agent. It has an environment in which it can sense things. The ability to look at the environment and make decisions based on plans.

clientagent.JPG

So whys that useful, well why is a robot useful? You can produce many different complex plans and give them to the robot and forget about it while it does the work potentially over and over again. If we are really lucky the robot can demonstrate some intelligence and deal with uncertainty.

Well I tried out a small part of this idea to build a server side service which delivered plans in JavaScript to the client. The JavaScript planning agent followed the plans. Its not a intelligent robot but this is just a prototype. The plans where focused on validation conditions that a user needed to get through to post a form.

Read the rest of this entry »

The event calculus planner used within my thesis was based on Dr. Murray Shanahan’s ASLDICN (Abductive SLD with Integrity constraints and proof by Negation) planner with compound action support. This planner is an adaptation from one published in one of Dr. Shanahan’s research papers

http://casbah.ee.ic.ac.uk/%7Empsha/planners.html

The original planner only supports the generation of a single plan. I needed to support conditional planning. I wanted the planner to generate multiple plans representing the different ways of reaching the goal. The problem was how to convert the planner to generate all possible plans. Importantly ensuring that this does not cause infinite looping and no redundant plan solutions are generated.

My version of the planner add the following features:

  • Conditional Planning
  • Impossible Predicate
  • Occured And NotOccured predicates

Read the rest of this entry »

Download PDF Thesis http://www.doc.ic.ac.uk/teaching/projects/Distinguished04/JosephWilk.pdf

This project took the HTML form systems as a model and built a Workflow Management System that uses artificial intelligence planning methodologies and Event Calculus workflow specifications to try to overcome some of the problems of Workflow Management Systems. Logic, server side languages and planning all rolled into one.

iWFMS admin interface

The development of the Workflow Management System with AI uncovered interesting issues in modelling situations in the Event Calculus and the
problems that need to be overcome to use AI with workflow. The problems and
solutions developed in the project cover a wide spectrum of domains, looking at logic
programming, server-side languages and getting the two to talk to each other. Areas
covered include such interesting topics as typing of HTML to new frameworks for
Prolog running as CGI.

Achievements

  • Workflow specification language
    Using the Event Calculus and extensions to specify workflow.
  • [HTML form typing | HTML typing in Prolog]
    A typing engine for ensuring that the HTML form element specifications are
    correct when used in workflow specifications.
  • A Visualisation tool for Event Calculus plans
    A tool that generates Scalable Vector Graphic graphs for Event Calculus
    plans.
  • A HTML/PHP iWFMS engine
    Using the plans generated from the workflow specifications to support the
    running and management of a system.
  • A JavaScript plan execution engine
    Facilitates the following of workflow plans in a scripting language that runs
    while the user is viewing and interacting with a web page.
  • Logic programming running as Common Gateway Interface (CGI)
    A framework for the use of high-level declarative programming languages
    functioning as CGI.
  • [Logic programming and server-side language interaction model | Interaction support between PHP and Prolog]
    An Interaction model allowing server-side languages used for generating web
    pages to interact with logic programming languages.
  • A Hospital model working example
    An example of how the specification can be utilised for a real world scenario in
    a hospital. Providing the full functionality within the iWFMS to run and manage
    this system.

Automatic Tag Generation

October 22nd, 2007

This project looked at dynamically generating suggestion tags for content. To simplify the task some constraints where introduced.

  • The content which will be tagged is news articles with HTML markup.
  • Only English content.

I used the following HTML page to experiment on with suggestion tags: http://news.bbc.co.uk/1/hi/entertainment/6624223.stm

To help evaluate the tagging methods I asked a sample of people to suggest what they thought the best tags would be. They came up with:

paris, hilton, paris hilton, jail, jail sentence, drink-driving

Read the rest of this entry »