lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2295) Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the same functionality as MaxFieldLength provided on IndexWriter
Date Fri, 05 Mar 2010 14:12:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841872#action_12841872
] 

Uwe Schindler commented on LUCENE-2295:
---------------------------------------

The TokenFilter is quite easy, only few lines of code:
- no attributes to be registered
- use a counter which is 0
- override incrementToken() to update counter on true, return false when counter reaches limit
or input exhausted
- reset() resets counter
- no other methods need to be overridden (this emulates the original behaviour of MaxFieldLength)

The Analyzer is more complicated as it should respect reusable streams. It should work like
QueryAutoStopWordAnalyzer and maintain a Map of field names to chached streams. To detect
if reusableTokenStream has reused a stream compare with cache. If new stream wrap.

> Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the same functionality
as MaxFieldLength provided on IndexWriter
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2295
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2295
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Shai Erera
>             Fix For: 3.1
>
>
> A spinoff from LUCENE-2294. Instead of asking the user to specify on IndexWriter his
requested MFL limit, we can get rid of this setting entirely by providing an Analyzer which
will wrap any other Analyzer and its TokenStream with a TokenFilter that keeps track of the
number of tokens produced and stop when the limit has reached.
> This will remove any count tracking in IW's indexing, which is done even if I specified
UNLIMITED for MFL.
> Let's try to do it for 3.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message