lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <>
Subject Re: Lucene search clusters
Date Wed, 08 Jun 2005 13:43:47 GMT

Lorenzo... Did you take a look at the mail I posted before? There was a 
ready-to-use clustering for Lucene there. It _is_ simple. I don't know 
what you mean by "much simpler" -- much simpler to use? You really don't 
have to know all of Carrot2 code to use it. You build, or fetch a 
precompiled clustering component and add an input to it. This is 
basically what I did before and it took just a few lines of code to 
integrate clustering into Lucene, so I can hardly find it difficult.

AHC and k-means are all classic clustering algorithms, so their 
implementation will always look similar. They, unfortunately, need a lot 
of tuning to get the results right. With Lingo clustering you can avoid 
much of that work.

But anyway, if you're up to the challange, feel free to contribute in 
any way you want to. I'll be glad to compare your clustering to ours.


Lorenzo wrote:
> First, thanks for your reply.
> I was wondering about adding some extra clustering functionalities to 
> Lucene. I wrote a clustering engine, based on hac/ahc and k-means algorithms 
> based on Lucene search results. That work is based on a customized solution, 
> and so I decided to write some general code . Right now I'm looking at this 
> class com/mwroblewski/carrot/filter/ahcfilter/AHCFilter from carrot2 and 
> found it to be very similar to my work;-)
> My aim is to provide some basic clustering functionalities to lucene search 
> results. Carrot2 offers a lot of functionalities, like many inputs, I'm just 
> trying to offer a simpler (much simpler!) clustering opportunity for lucene 
> users.
> Hope I can get some good advices from you!
> ciao,
> Lorenzo
> On 6/8/05, Dawid Weiss <> wrote:
>>You should state your requirements clearly:
>>1. What data you want to cluster? (whole index/ search results)
>>2. What is the role of the extension? How is it going to be used?
>>(front-end clusters, query refinement, etc)
>>3. Do you need the implementation or an API for clustering in the
>>source code? (I'd personally stick to the API; there are many products
>>out there that perform clustering. Carrot2 is no exception -- there is
>>an excellent (in my humble opinion :) open source clustering algorithm
>>Lingo, but there is also a commercial component that is much faster and
>>more customizable. You can start off with an open source clusterer then
>>and switch to a commercial product if you want higher scalability or
>>different functionality. I implemented such an API in Nutch -- take a
>>look in its source code for hints).
>>Lorenzo wrote:
>>>I see some noise about clustering and lucene, but I'm still waiting for
>>>someone that will help me creating a clustering extension.
>>>I know both carrot2 and weka (the first can be integrated with Lucene, 
>>>latter may be - Falko can you tell me?) but would like to write 
>>>that could be included in the sandbox (or similar) with an 
>>>that we'll find the better for a general purpose environment. Maybe 
>>>or other will be the best one (I really hope, I'm a lazy coder;-) ) and 
>>>we will simply ask David to extend his code, but first want to make some
>>To unsubscribe, e-mail:
>>For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message