lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lorenzo <>
Subject Re: Lucene search clusters
Date Wed, 08 Jun 2005 13:33:26 GMT
First, thanks for your reply.
I was wondering about adding some extra clustering functionalities to 
Lucene. I wrote a clustering engine, based on hac/ahc and k-means algorithms 
based on Lucene search results. That work is based on a customized solution, 
and so I decided to write some general code . Right now I'm looking at this 
class com/mwroblewski/carrot/filter/ahcfilter/AHCFilter from carrot2 and 
found it to be very similar to my work;-)
My aim is to provide some basic clustering functionalities to lucene search 
results. Carrot2 offers a lot of functionalities, like many inputs, I'm just 
trying to offer a simpler (much simpler!) clustering opportunity for lucene 
Hope I can get some good advices from you!

On 6/8/05, Dawid Weiss <> wrote:
> You should state your requirements clearly:
> 1. What data you want to cluster? (whole index/ search results)
> 2. What is the role of the extension? How is it going to be used?
> (front-end clusters, query refinement, etc)
> 3. Do you need the implementation or an API for clustering in the
> source code? (I'd personally stick to the API; there are many products
> out there that perform clustering. Carrot2 is no exception -- there is
> an excellent (in my humble opinion :) open source clustering algorithm
> Lingo, but there is also a commercial component that is much faster and
> more customizable. You can start off with an open source clusterer then
> and switch to a commercial product if you want higher scalability or
> different functionality. I implemented such an API in Nutch -- take a
> look in its source code for hints).
> Dawid
> Lorenzo wrote:
> > I see some noise about clustering and lucene, but I'm still waiting for
> > someone that will help me creating a clustering extension.
> > I know both carrot2 and weka (the first can be integrated with Lucene, 
> the
> > latter may be - Falko can you tell me?) but would like to write 
> something
> > that could be included in the sandbox (or similar) with an 
> implementation
> > that we'll find the better for a general purpose environment. Maybe 
> carrot2
> > or other will be the best one (I really hope, I'm a lazy coder;-) ) and 
> so
> > we will simply ask David to extend his code, but first want to make some
> > tests.
> > bye
> > Lorenzo
> >
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message