My approach uses the same technique, but I'm using mostly HAG clustering.
I did manage to add clustering support to a lucene based application (a
customized solution), but I'd like to try to create a 'general purpose'
library. I know it ain't easy!
I've found many scaling issues, but I saw that with an optimized algorithms
you can have pretty good results. Reading a carrot2 and lucene related
messages, I figured out that I can cluster only the n first results,
avoiding any performance issue in that way.
Lucene offers a good support to a clustering framework, based on a tf idf
analysis (not thinking of k-means or EM 'til now).
The most interesting problem is creating the architecture for such a system,
being general purpose but also very efficient.
Thanks,
Lorenzo
On 6/8/05, Daniel Stephan <fast.jack@gmx.net> wrote:
>
> I am currently writing sth about text retrieval using EM clustering. The
> approach represents documents as high-dimensional vectors, but still it
> is not related to Lucene (yet?).
> How would you add clustering to Lucene? I think it may be a very
> interesting technique to improve search results. If it works. My current
> experience shows that it scales rather bad for larger document
> collections.
>
> I don't think I will take part in Googles SoC, as I have my own "summer
> of code" right now. But I would surely like to take part in discussions
> about that topic, or at least read it and throw 2cents at it now and then.
>
> cheers
> Daniel
>
>
> Lorenzo schrieb:
>
> >Some people just replied, but I forgot the most important thing...
> >I'm thinking of this project as part of the Google's Summer of Code
> program,
> >so I'm looking for other students.
> >I've sent an email to Erik and he told me that we can propose this as
> part
> >of Google's SoC if we find some other people interested in it.
> >Lorenzo
> >
> >On 6/7/05, Lorenzo <lorenzo.viscanti@gmail.com> wrote:
> >
> >
> >>I'm writing this message trying to find some people interested in
> creating
> >>a 'general purpose' lucene search results' clustering extension.
> >>I wrote a simply implementation of clustering, and I would like to
> >>contribute to lucene development by releasing an open source clustering
> >>implementation. I know that maybe each project need a different
> >>implementation but that would be a useful basis for everyone to develop
> his
> >>own project.
> >>Is anyone interested in it?
> >>Lorenzo
> >>
> >>
> >>
> >
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
|