lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <j...@apache.org>
Subject [jira] Created: (LUCENE-1917) ShingleFilter include words
Date Thu, 17 Sep 2009 21:17:57 GMT
ShingleFilter include words
---------------------------

                 Key: LUCENE-1917
                 URL: https://issues.apache.org/jira/browse/LUCENE-1917
             Project: Lucene - Java
          Issue Type: Improvement
          Components: contrib/analyzers
    Affects Versions: 2.9
            Reporter: Jason Rutherglen
            Priority: Minor
             Fix For: 3.0


By default ShingleFilter creates shingles (i.e. combines tokens
into a single token) from all tokens. For the purposes of for
example, indexing stop words as shingles, however not creating
shingles out of every word, we can supply an include words
CharArraySet to ShingleFilter that determines the tokens to
shingle. 

This is similar to Nutch CommonGrams and SOLR-908. SOLR-908
does not utilize the new token attribute API, and I figured this
functionality is more suitable being a part of Lucene. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message