Latent Semantic Analysis in Ruby
21 Nov
I’ve had lots of requests for a Ruby version to follow up my Latent Semantic Analysis in Python article. So I’ve rewritten the code and article for Ruby. I wrote LSA from scratch this time and test driven so it has some subtle differences from the Python version.
What is LSA?
Latent Semantic Analysis (LSA) is a mathematical method that tries to bring out latent relationships within a collection of documents. Rather than looking at each document isolated from the others it looks at all the documents as a whole and the terms within them to identify relationships.
An example of LSA:
Using a search engine search for “ruby“.
Documents are returned which do not contain the search term “ruby” but contains terms like “rails“.
LSA has identified a latent relationship, “ruby” is semantically close to “rails“.
How does it work?
Given a set of word documents, each word in those documents represents a point in the semantic space. LSA uses a mathematical technique called Singular value decomposition to take the documents/words represented as a matrix and produce a reduced approximation of this matrix. In doing this it reduces the overall noise in the semantic space bringing words together. Hence after applying LSA some words share similar points in the semantic space, they are semantically similar.
These groups of semantically similar words form concepts and those concepts in turn relate to documents.
Term a < -----------> Term b < -----------> Concept d < ---------> Document e Term c < ----------->
