lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Rowe (JIRA)" <j...@apache.org>
Subject [jira] Created: (LUCENE-2400) ShingleFilter: don't output all-filler shingles/unigrams; also, convert from TermAttribute to CharTermAttribute
Date Sun, 18 Apr 2010 15:30:25 GMT
ShingleFilter: don't output all-filler shingles/unigrams; also, convert from TermAttribute
to CharTermAttribute
---------------------------------------------------------------------------------------------------------------

                 Key: LUCENE-2400
                 URL: https://issues.apache.org/jira/browse/LUCENE-2400
             Project: Lucene - Java
          Issue Type: Improvement
          Components: contrib/analyzers
    Affects Versions: 3.0.1
            Reporter: Steven Rowe
            Priority: Minor


When the input token stream to ShingleFilter has position increments greater than one, filler
tokens are inserted for each position for which there is no token in the input token stream.
 As a result, unigrams (if configured) and shingles can be filler-only.  Filler-only output
tokens make no sense - these should be removed.

Also, because TermAttribute has been deprecated in favor of CharTermAttribute, the patch will
also convert TermAttribute usages to CharTermAttribute in ShingleFilter.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message