lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@gmail.com>
Subject Re: Clustering with Lucene?
Date Tue, 26 Apr 2011 11:56:48 GMT
Can you shed some more light on what you're trying to achieve (what is
the purpose of clustering -- are clusters to be utilized for front-end
user interface, further data mining analysis, etc.)?

With the sizes you report Carrot2 won't work for you, I'm afraid, but
Mahout may. Still, there's plenty of algorithms and preprocessing
options to consider, so if you provide more background somebody may
push you in the right direction.

Dawid

On Tue, Apr 26, 2011 at 1:49 PM, vivek sar <vivextra@gmail.com> wrote:
> Hi,
>
>  I've been researching about clustering with Lucene. Here is what
> I've found so far,
>
> 1) Lucene clustering with Carrot2 -
> http://download.carrot2.org/head/manual/#section.getting-started.lucene
>   - but, this seems suitable for only smaller size index (few hundred
> documents) -  http://download.carrot2.org/head/manual/#section.advanced-topics.fine-tuning.choosing-algorithm
>
> 2) Lucene clustering with Mahout -
> http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3
>   - I'm not very sure if this is ready for prime time yet, there
> seems to be very few examples on how to do this. Has anyone tried this
> with large index size (millions of documents)?
>
> 3) Some clustering library in Lucene's contribution folder -
> https://issues.apache.org/jira/browse/LUCENE-1421
>   - this doesn't seem to be officially supported and last update to
> it was an year back. Has anyone tried this?
>
> 4) Lucene clustering with Terracotta -
> http://orionl.blogspot.com/2006/11/clustering-lucene.html
>  - I'm not sure how to do this, there doesn't seem to be much
> activity around this.
>
> We got large indexes - over 500 million records, we usually partition
> our index after 20 million records. There are total of 20 fields in
> our index, of which we are trying to cluster 5 fields. Is there any
> clustering solution for Lucene that would work for us? Carrot2 looked
> the most active and promising, but it's clearly recommended for a
> small index size. Any other suggestions?
>
> Thanks,
> -vivek
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message