lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <>
Subject [jira] Commented: (LUCENE-1842) Add reset(AttributeSource) method to AttributeSource
Date Sat, 22 Aug 2009 08:11:14 GMT


Uwe Schindler commented on LUCENE-1842:

I still do not understand your proposal. You can always create all tokenizer chains at the
beginning with exactly one tokenizer (after LUCENE-1826). You are then free to call incrementToken()
on all sub-tokenstreams and all these calls will put the tokenized values in the same attributes.

Adding a reset(AttributeSource) method would not help really, as you would have to do this
for the whole Tokenizer chain. If you do it in the wrong way, there may be some tokenfilters
in the chain that use a different attributesource and so on. Because of all these problem
and the complexity, we do not want to have setters for AttributeSources or changes of AttributeFactory
and so on. During the lifetime of one TokenStream, there is in my opinion no real use-case
for changing its attribute maps that rectify the added complexity and risk for errors. 

The cost of adding Attributes is very low if you reuse TokenStreams, what you could even do
with your concenatting TokenStream.

> Add reset(AttributeSource) method to AttributeSource
> ----------------------------------------------------
>                 Key: LUCENE-1842
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Wish
>          Components: Analysis
>            Reporter: Tim Smith
>            Priority: Minor
>             Fix For: 2.9
> Originally proposed in LUCENE-1826
> Proposing the addition of the following method to AttributeSource
> {code}
> public void reset(AttributeSource input) {
>     if (input == null) {
>       throw new IllegalArgumentException("input AttributeSource must not be null");
>     }
>     this.attributes = input.attributes;
>     this.attributeImpls = input.attributeImpls;
>     this.factory = input.factory;
> }
> {code}
> Impacts:
> * requires all TokenStreams/TokenFIlters/etc to call addAttribute() in their reset()
method, not in their constructor
> * requires making AttributeSource.attributes and AttributeSource.attributesImpl non-final
> Advantages:
> Allows creating only a single actual AttributeSource per thread that can then be used
for indexing with a multitude of TokenStream/Tokenizer combinations (allowing utmost reuse
of TokenStream/Tokenizer instances)
> this results in only a single "attributes"/"attributesImpl" map being required per thread
> addAttribute() calls will almost always return right away (will only be "initialized"
once per thread)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message