lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stanislaw Osinski (JIRA)" <>
Subject [jira] [Updated] (SOLR-2450) Carrot2 clustering should use both its own and Solr's stop words
Date Sat, 02 Apr 2011 18:12:05 GMT


Stanislaw Osinski updated SOLR-2450:

    Attachment: SOLR-2450.patch

Patch for the use of stop words from the field's {{StopWordFilterFactory}} and {{CommonGramsFilterFactory}}
in addition to Carrot2's built-in stop words.

Requires the SOLR-2448 and SOLR-2449 patches applied. 

> Carrot2 clustering should use both its own and Solr's stop words
> ----------------------------------------------------------------
>                 Key: SOLR-2450
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Clustering
>            Reporter: Stanislaw Osinski
>            Assignee: Stanislaw Osinski
>            Priority: Minor
>             Fix For: 3.2, 4.0
>         Attachments: SOLR-2450.patch
> While using only Solr's stop words for clustering isn't a good idea (compared to indexing,
clustering needs more aggressive stop word removal to get reasonable cluster labels), it would
be good if Carrot2 used both its own and Solr's stop words.
> I'm not sure what the best way to implement this would be though. My first thought was
to simply load {{stopwords.txt}} from Solr config dir and merge them with Carrot2's. But then,
maybe a better approach would be to get the stop words from the StopFilter being used? Ideally,
we should also consider the per-field stop filters configured on the fields used for clustering.

This message is automatically generated by JIRA.
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message