lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] Commented: (LUCENE-1917) ShingleFilter include words
Date Mon, 09 Nov 2009 16:33:33 GMT


Robert Muir commented on LUCENE-1917:

bq. I'm going to port SOLR-908 rather than reuse ShingleFilter as SF seems to be built tightly
for it's use case. 

Jason, is this still your plan? Can we move this out of 3.0 for now?

> ShingleFilter include words
> ---------------------------
>                 Key: LUCENE-1917
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>    Affects Versions: 2.9
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 3.0
>   Original Estimate: 24h
>  Remaining Estimate: 24h
> By default ShingleFilter creates shingles (i.e. combines tokens
> into a single token) from all tokens. For the purposes of for
> example, indexing stop words as shingles, however not creating
> shingles out of every word, we can supply an include words
> CharArraySet to ShingleFilter that determines the tokens to
> shingle. 
> This is similar to Nutch CommonGrams and SOLR-908. SOLR-908
> does not utilize the new token attribute API, and I figured this
> functionality is more suitable being a part of Lucene. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message