mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Suggestions on distance measures for clustering news articles
Date Mon, 09 Jan 2012 05:16:28 GMT
The easy way to fake a recommendation engine using a search engine is to
put the related items in as a field on each item.

Then if you want to find items related to this item, you can just pull in
the field.

If you put the reverse links in a different field, then you can do
recommendations given a history of items by using those items as a query.

On Sun, Jan 8, 2012 at 6:44 PM, Lance Norskog <goksron@gmail.com> wrote:

> The cool thing about Cosine Similarity is that it is (roughly) what
> Lucene uses. This means that once you tune your recommender, it is
> possible to transform it into a Lucene index.
>
> How? I don't know. Ted did this at Veoh.
>
> On Sun, Jan 8, 2012 at 5:14 AM, Robert Giacinto
> <robert.giacinto@gmail.com> wrote:
> > Hi Raphael,
> >
> > Cosine Similarity is always a good choice.
> >
> > You can find an evaluation of different distance measures for text
> > clustering problems in Similarity Measures for Text Document Clustering
> by
> > Anne Huang, 2008.
> >
> http://nzcsrsc08.canterbury.ac.nz/site/proceedings/Individual_Papers/pg049_Similarity_Measures_for_Text_Document_Clustering.pdf
> >
> > -- Robert
> >
> >
> > 2012/1/8 Raphael Cendrillon <cendrillon1978@gmail.com>
> >
> >> Thanks Yue!
> >>
> >> On Jan 7, 2012, at 6:17 PM, Yue Guan <pipehappy@gmail.com> wrote:
> >>
> >> > Hi, Raphael
> >> >
> >> > Cosine distance is good for text. You may try it.
> >> >
> >> > --Yue
> >> >
> >> > On Sat, Jan 7, 2012 at 9:05 PM, Raphael Cendrillon
> >> > <cendrillon1978@gmail.com> wrote:
> >> >> Hi everyone,
> >> >>
> >> >> I'm working on a problem clustering news articles around common
> themes.
> >> There seem to be quite a few different distance measures that can be
> >> applied.
> >> >>
> >> >> Does anyone have any suggestions on a good general purpose measure
to
> >> start out with?
> >> >>
> >> >> Thanks!
> >>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message