mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raphael Cendrillon <cendrillon1...@gmail.com>
Subject Re: Suggestions on distance measures for clustering news articles
Date Mon, 09 Jan 2012 02:54:53 GMT
Thanks!

By the way, did anyone else participate in the codesprint this year?

It was nice to see a few machine learning problems show up, like clustering and classification.


On 8 Jan, 2012, at 6:44 PM, Lance Norskog wrote:

> The cool thing about Cosine Similarity is that it is (roughly) what
> Lucene uses. This means that once you tune your recommender, it is
> possible to transform it into a Lucene index.
> 
> How? I don't know. Ted did this at Veoh.
> 
> On Sun, Jan 8, 2012 at 5:14 AM, Robert Giacinto
> <robert.giacinto@gmail.com> wrote:
>> Hi Raphael,
>> 
>> Cosine Similarity is always a good choice.
>> 
>> You can find an evaluation of different distance measures for text
>> clustering problems in Similarity Measures for Text Document Clustering by
>> Anne Huang, 2008.
>> http://nzcsrsc08.canterbury.ac.nz/site/proceedings/Individual_Papers/pg049_Similarity_Measures_for_Text_Document_Clustering.pdf
>> 
>> -- Robert
>> 
>> 
>> 2012/1/8 Raphael Cendrillon <cendrillon1978@gmail.com>
>> 
>>> Thanks Yue!
>>> 
>>> On Jan 7, 2012, at 6:17 PM, Yue Guan <pipehappy@gmail.com> wrote:
>>> 
>>>> Hi, Raphael
>>>> 
>>>> Cosine distance is good for text. You may try it.
>>>> 
>>>> --Yue
>>>> 
>>>> On Sat, Jan 7, 2012 at 9:05 PM, Raphael Cendrillon
>>>> <cendrillon1978@gmail.com> wrote:
>>>>> Hi everyone,
>>>>> 
>>>>> I'm working on a problem clustering news articles around common themes.
>>> There seem to be quite a few different distance measures that can be
>>> applied.
>>>>> 
>>>>> Does anyone have any suggestions on a good general purpose measure to
>>> start out with?
>>>>> 
>>>>> Thanks!
>>> 
> 
> 
> 
> -- 
> Lance Norskog
> goksron@gmail.com


Mime
View raw message