lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2295) Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the same functionality as MaxFieldLength provided on IndexWriter
Date Fri, 05 Mar 2010 18:04:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841959#action_12841959
] 

Uwe Schindler commented on LUCENE-2295:
---------------------------------------

After some discussion with rmuir, we realized, that an explicit reuse of the filter does not
make sense. The maintenance of the Map<String,ReuseableStream> is more resource and
maintenance than simply creating a class instance without any initialization cost.
The simple implementation of the Analyzer would be:

- override reusableTokenStream that delegates to the inner analyzer and wrap it with the filter.
The cost of creating the filter is neglectible, as the filter has no initialization cost (it
uses no attributes, does not create attribute maps,...)
- override tokenStream that does the same, but instead delegates to inner analyzers tokenStream
method. 
- Make this analyzer final, else we need VirtualMethod (also the TokenFilter, of course)
- Override the rest of the methods in Analyzer and simply delegate. Don't forget the posIncr
Gap methods and so on!

I will supply a patch with filter and analyzer later.

> Create a MaxFieldLengthAnalyzer to wrap any other Analyzer and provide the same functionality
as MaxFieldLength provided on IndexWriter
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2295
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2295
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Shai Erera
>            Assignee: Uwe Schindler
>             Fix For: 3.1
>
>
> A spinoff from LUCENE-2294. Instead of asking the user to specify on IndexWriter his
requested MFL limit, we can get rid of this setting entirely by providing an Analyzer which
will wrap any other Analyzer and its TokenStream with a TokenFilter that keeps track of the
number of tokens produced and stop when the limit has reached.
> This will remove any count tracking in IW's indexing, which is done even if I specified
UNLIMITED for MFL.
> Let's try to do it for 3.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message