Latent Semantic Analysis in Python
19 Dec
Latent Semantic Analysis (LSA) is a mathematical method that tries to bring out latent relationships within a collection of documents. Rather than looking at each document isolated from the others it looks at all the documents as a whole and the terms within them to identify relationships.
An example of LSA:
Using a search engine search for “sand“.
Documents are returned which do not contain the search term “sand” but contains terms like “beach”.
LSA has identified a latent relationship, “sand” is semantically close to “beach”.
There are some very good papers which describing LSA in detail:
- An introduction to LSA: http://lsa.colorado.edu/papers/dp1.LSAintro.pdf
- Creating your own LSA space: http://www.andrew.cmu.edu/user/jquesada/pdf/bookSpacesRev1.pdf
- Latent Semantic analysis: http://en.wikipedia.org/wiki/Latent_semantic_indexing
This is an implementation of LSA in Python (2.4+). Thanks to scipy its rather simple!