mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juraj Vitko <>
Subject which algorithm for determining document similarity?
Date Thu, 21 Jun 2012 19:53:37 GMT
Hello everyone,

I'm designing a discussion/knowledge-exchange system where users could
submit a textual message, and based of the content of that message, a
previously entered, the topically most similar messages would be returned
by the system.

Normally in existing systems, this is achieved by having people explicitly
specify #hashtags in their messages, however I'd possibly like to do this
in a more natural way, just based purely on the content of the message, if
this is possible to do in real-time.

I'm new to Mahout and machine learning in general, so I will refrain from
guessing, and my question is - is this something Mahout can be used for, to
be done in real-time, in a distributed way, and incrementally as new
messages are added to the system? And if so, could you please give a few
pointers on how to approach this?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message