mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geek Gamer <>
Subject Re: Mixing simiarity measures
Date Tue, 29 May 2012 05:36:37 GMT
RI seems pretty interesting,

Do you have any refernce paper or system about how people have used it
to improve recommendation systems? How people define context vectors
using extra information.

A quick idea I got was to use LDA to build topic vectors and use them
as context vectors, any thoughts on that.

RI seems to be a good candidate for contribution to mahout.

On Wed, May 23, 2012 at 12:11 PM, Ted Dunning <> wrote:
> RI, per se, probably won't help that much with the coincidence problem.
> The Mahout math libraries would help a lot with a random indexing
> implementation.
> Kitenga has some very nice random indexing support.  See
> They offer commercial software, but you get what you pay for.
> On Wed, May 23, 2012 at 12:18 AM, Mugoma Joseph Okomba <>wrote:
>> Thanks for all the comments. They give us idea on what direction to take.
>> We have been zeroing on idea of Random Indexing, but R.I seems missing in
>> mahout currently. Are there future plans for implementing R.I in mahout?
>> Any libraries out that that would be useful for R.I?
>> On Sun, May 20, 2012 9:47 am, Ted Dunning wrote:
>> > The basic reasoning here is that any cooccurrence measure without
>> > smoothing
>> > is will have zero overlap whenever all the others have zero overlap.
>>  This
>> > seems to be the root of your problem.  The solution is to increase
>> overlap
>> > or increase data.
>> >
>> > The problem with correlation based approaches is that they over state
>> > coincidental overlaps.  Fixing that can't fix the problem of no overlap.
>> >

View raw message