lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lorenzo <>
Subject Re: Lucene search clusters
Date Tue, 07 Jun 2005 23:19:54 GMT
My approach uses the same technique, but I'm using mostly HAG clustering.
I did manage to add clustering support to a lucene based application (a 
customized solution), but I'd like to try to create a 'general purpose' 
library. I know it ain't easy!
I've found many scaling issues, but I saw that with an optimized algorithms 
you can have pretty good results. Reading a carrot2 and lucene related 
messages, I figured out that I can cluster only the n first results, 
avoiding any performance issue in that way.
Lucene offers a good support to a clustering framework, based on a tf idf 
analysis (not thinking of k-means or EM 'til now).
The most interesting problem is creating the architecture for such a system, 
being general purpose but also very efficient.

On 6/8/05, Daniel Stephan <> wrote:
> I am currently writing sth about text retrieval using EM clustering. The
> approach represents documents as high-dimensional vectors, but still it
> is not related to Lucene (yet?).
> How would you add clustering to Lucene? I think it may be a very
> interesting technique to improve search results. If it works. My current
> experience shows that it scales rather bad for larger document 
> collections.
> I don't think I will take part in Googles SoC, as I have my own "summer
> of code" right now. But I would surely like to take part in discussions
> about that topic, or at least read it and throw 2cents at it now and then.
> cheers
> Daniel
> Lorenzo schrieb:
> >Some people just replied, but I forgot the most important thing...
> >I'm thinking of this project as part of the Google's Summer of Code 
> program,
> >so I'm looking for other students.
> >I've sent an email to Erik and he told me that we can propose this as 
> part
> >of Google's SoC if we find some other people interested in it.
> >Lorenzo
> >
> >On 6/7/05, Lorenzo <> wrote:
> >
> >
> >>I'm writing this message trying to find some people interested in 
> creating
> >>a 'general purpose' lucene search results' clustering extension.
> >>I wrote a simply implementation of clustering, and I would like to
> >>contribute to lucene development by releasing an open source clustering
> >>implementation. I know that maybe each project need a different
> >>implementation but that would be a useful basis for everyone to develop 
> his
> >>own project.
> >>Is anyone interested in it?
> >>Lorenzo
> >>
> >>
> >>
> >
> >
> >
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message