mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <>
Subject Re: lsi
Date Mon, 14 Nov 2011 06:16:00 GMT
I use seq2sparse + ssvd. Subsequent use patterns vary and in my case
proprietary but mainly revolve around fold-in updates into pretrained term
space and various locality sensitive tricks depending on the patterns you
use. My pattern involves scanning  first n nearest neighbours with smallest
distance first preferably without examining the entire neighborhood as
opposed to finding all neighbours in a given distance radius which is what
most of algorithms actually do out of the box.

I suspect although not quite convinced that document mixture models such as
lda would produce a better fit than classic svd based lsi.
On Nov 13, 2011 10:48 AM, "Sebastian Schelter" <> wrote:

> Is there some documentation/tutorial available on how to build a LSI
> pipeline with mahout and lucene?
> --sebastian

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message