mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Limon <>
Subject Re: Top items in a vector
Date Mon, 25 Apr 2011 22:50:30 GMT
2011/4/25 Dmitriy Lyubimov <>

> if he just needs top N with the highest similarity score...well...
> that's kind of a problem i am solving for LSI over hbase right now...
> I don't want to disclose exactly how, or Ted will say that's not the
> way :) But, there are definitely ways to organize the vector space
> model to find N closest without scanning the entire vector set.
> -d
Hello Dmitry,

I'd love to know more about this non-disclosable solution! I'm basically
using Mahout's command-line instructions to model an LSI space (along with
some custom code to get the query to execute against the original
dictionary). My guess is that my current solution is too naive and does not
take advantage of the full capabilities of Mahout. If you want to disclose
some of the specifics, I'd love open a new thread to discuss my approach.



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message