<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Latent Semantic Analysis in Python</title>
	<atom:link href="http://blog.josephwilk.net/projects/latent-semantic-analysis-in-python.html/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.josephwilk.net/projects/latent-semantic-analysis-in-python.html</link>
	<description>on AI, The Web, Usability, Testing &#38; Software process</description>
	<lastBuildDate>Wed, 27 Jan 2010 02:14:31 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Gregg Lind</title>
		<link>http://blog.josephwilk.net/projects/latent-semantic-analysis-in-python.html/comment-page-1#comment-1072</link>
		<dc:creator>Gregg Lind</dc:creator>
		<pubDate>Sat, 06 Dec 2008 17:37:12 +0000</pubDate>
		<guid isPermaLink="false">http://www.joesniff.co.uk/projects/latent-semantic-analysis-in-python.html#comment-1072</guid>
		<description>One major problem with LSA is that is doesn&#039;t scale well to large sizes, because of the LDU matrix decomposition.... it&#039;s O(docs^2 + unique_terms^2), even when calculated using sparse methods.  It can also be very hard interpret what the &quot;factors&quot; really mean.</description>
		<content:encoded><![CDATA[<p>One major problem with LSA is that is doesn&#8217;t scale well to large sizes, because of the LDU matrix decomposition&#8230;. it&#8217;s O(docs^2 + unique_terms^2), even when calculated using sparse methods.  It can also be very hard interpret what the &#8220;factors&#8221; really mean.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ESER</title>
		<link>http://blog.josephwilk.net/projects/latent-semantic-analysis-in-python.html/comment-page-1#comment-392</link>
		<dc:creator>ESER</dc:creator>
		<pubDate>Tue, 24 Jun 2008 22:43:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.joesniff.co.uk/projects/latent-semantic-analysis-in-python.html#comment-392</guid>
		<description>Andreas: we&#039;re applying LSA (actually, PLSI) to textual data for our recommendationa algorithm, which we launched a demo of a few days ago.

As Joseph points out, the real trick is dimensionality reduction. How you tune your dimensionality reduction algorithm and/or heuristic will have a significant impact on precision and recall (and performance).

Overall, however, LSA works very well for computing recommendations from textual data. Currently we&#039;re applying PLSI to only the English Wikipedia corpus; however, it would probably work well on the IMDB dataset, too.

-ESer.org</description>
		<content:encoded><![CDATA[<p>Andreas: we&#8217;re applying LSA (actually, PLSI) to textual data for our recommendationa algorithm, which we launched a demo of a few days ago.</p>
<p>As Joseph points out, the real trick is dimensionality reduction. How you tune your dimensionality reduction algorithm and/or heuristic will have a significant impact on precision and recall (and performance).</p>
<p>Overall, however, LSA works very well for computing recommendations from textual data. Currently we&#8217;re applying PLSI to only the English Wikipedia corpus; however, it would probably work well on the IMDB dataset, too.</p>
<p>-ESer.org</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joseph Wilk</title>
		<link>http://blog.josephwilk.net/projects/latent-semantic-analysis-in-python.html/comment-page-1#comment-307</link>
		<dc:creator>Joseph Wilk</dc:creator>
		<pubDate>Fri, 30 May 2008 19:11:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.joesniff.co.uk/projects/latent-semantic-analysis-in-python.html#comment-307</guid>
		<description>Hello Andreas.

Ultimately with LSA we are using a vector space model. No matter what way we derive the weights (pLSA, LSA, LDA) we map to that model.

So within the model we can use cosine to measure the relatedness of documents in the vector space.

With your example I can see you could find subtitles that where related. As for recommendations it really depends on what your movie data set is like. Whether it would make sense mapping a movie data item to a document in the vector space (where each vector is a subtitle document) and using cosine to find recommendations for subtitles.</description>
		<content:encoded><![CDATA[<p>Hello Andreas.</p>
<p>Ultimately with LSA we are using a vector space model. No matter what way we derive the weights (pLSA, LSA, LDA) we map to that model.</p>
<p>So within the model we can use cosine to measure the relatedness of documents in the vector space.</p>
<p>With your example I can see you could find subtitles that where related. As for recommendations it really depends on what your movie data set is like. Whether it would make sense mapping a movie data item to a document in the vector space (where each vector is a subtitle document) and using cosine to find recommendations for subtitles.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andreas</title>
		<link>http://blog.josephwilk.net/projects/latent-semantic-analysis-in-python.html/comment-page-1#comment-289</link>
		<dc:creator>Andreas</dc:creator>
		<pubDate>Sun, 25 May 2008 10:32:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.joesniff.co.uk/projects/latent-semantic-analysis-in-python.html#comment-289</guid>
		<description>Hello to all, 
 great post


I am a student and my thesis is on recommendation systems
i am working on a movie data set ( using svd ) 

i was wondering if we could use LSA on a subtitle data set
( if there is  one )  in order to get recommendations

its just an idea, i see you dont have recommendations in 
LSA’s applications so i just had to ask

thank you</description>
		<content:encoded><![CDATA[<p>Hello to all,<br />
 great post</p>
<p>I am a student and my thesis is on recommendation systems<br />
i am working on a movie data set ( using svd ) </p>
<p>i was wondering if we could use LSA on a subtitle data set<br />
( if there is  one )  in order to get recommendations</p>
<p>its just an idea, i see you dont have recommendations in<br />
LSA’s applications so i just had to ask</p>
<p>thank you</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joseph Wilk</title>
		<link>http://blog.josephwilk.net/projects/latent-semantic-analysis-in-python.html/comment-page-1#comment-198</link>
		<dc:creator>Joseph Wilk</dc:creator>
		<pubDate>Wed, 23 Apr 2008 21:11:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.joesniff.co.uk/projects/latent-semantic-analysis-in-python.html#comment-198</guid>
		<description>I agree the lack of statistical foundations of LSA does leave some room for improvement.

My next release will be Latent Dirichlet allocation and this will be implemented in ruby. I&#039;ll post here as soon as I&#039;ve got the code up (and I&#039;ll release the code before I write up the blog if you want to see it early).</description>
		<content:encoded><![CDATA[<p>I agree the lack of statistical foundations of LSA does leave some room for improvement.</p>
<p>My next release will be Latent Dirichlet allocation and this will be implemented in ruby. I&#8217;ll post here as soon as I&#8217;ve got the code up (and I&#8217;ll release the code before I write up the blog if you want to see it early).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shankar</title>
		<link>http://blog.josephwilk.net/projects/latent-semantic-analysis-in-python.html/comment-page-1#comment-196</link>
		<dc:creator>Shankar</dc:creator>
		<pubDate>Tue, 22 Apr 2008 18:42:04 +0000</pubDate>
		<guid isPermaLink="false">http://www.joesniff.co.uk/projects/latent-semantic-analysis-in-python.html#comment-196</guid>
		<description>can you please put up source code of PLSA also....it sounds more challenging. LSA has a lot of scope for improvement, PLSA being one method. So please work and put up source code if you can.</description>
		<content:encoded><![CDATA[<p>can you please put up source code of PLSA also&#8230;.it sounds more challenging. LSA has a lot of scope for improvement, PLSA being one method. So please work and put up source code if you can.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
