lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Busch <busch...@gmail.com>
Subject Re: attribute thoughts
Date Wed, 19 Aug 2009 00:01:51 GMT
On 8/14/09 9:23 AM, Yonik Seeley wrote:
> On Thu, Aug 13, 2009 at 4:32 PM, Michael Busch<buschmic@gmail.com>  wrote:
>    
>> On 8/13/09 7:29 AM, Yonik Seeley wrote:
>>      
>>> I'm liking the new attribute based analysis (in conjunction with
>>> reusability), but I'm running into some questions...
>>>
>>> Is it valid for tokenizers or token filters add new attributes after
>>> their constructor (after they have processed some tokens)?
>>>
>>>        
>> At the moment we're saying in the javadocs of TokenStream that all
>> Attributes should be
>> added up front.
>>      
> Hmmm, OK... in which case, token producers using restoreState() would
> not have to call clearAttributes() first.
>
>    
>> We could change these semantics. I had some thoughts about
>> it in the original
>> JIRA issue (LUCENE-1422).
>>      
> Apologies if I'm rehashing anything - it's hard to keep up with some
> of those monster (high volume) issues.
>
>    
>> So back to your question if we should allow restoreState() to add attributes
>> and use a state across different AttributeSources: the complication is that we can
only
>> allow that if  the different AttributeSource were filled using the same AttributeFactory,
otherwise
>> different AtttributeImpls could be in the sources and the copying wouldn't
>> work anymore.
>>      
> Hmmm, so perhaps just an assertion that the factories are equal... and
> documentation saying that moving state from one stream to the other
> requires identical factories?  Anyway, I don't currently have a use
> case for this... I was just wondering.
>    

Yes that should work. We basically have such an assertion in 
TeeSinkTokenFilter:
   public void addSinkTokenStream(final SinkTokenStream sink) {
     // check that sink has correct factory
     if (!this.getAttributeFactory().equals(sink.getAttributeFactory())) {
       throw new IllegalArgumentException("The supplied sink is not 
compatible to this tee");
     }

So I agree, we should just do the same in restoreState().
> Another thing I was wondering about was the opacity of State - one
> can't inspect or change the attributes w/o restoring it first.
> Undesirable limitation, or feature allowing more flexible state
> implementations?
>
>    

Excellent point! This limitation is currently there to discourage 
changing values of
a state, because that would be rather inefficient: you'd have to lookup 
the attribute(s)
of each state you want to change. We could write a StateContainer, which 
has an API
to access states in an efficient way (iterator, random access), using 
delegation.

When I changed the contrib TokenStreams this limitation was somewhat 
annoying
for some streams - but in all cases it was possible to implement the 
streams far more
efficient by avoiding excessive caching. (except ShingleMatrixFilter, I 
gave up eventually,
not knowing that code at all).

So I agree we should come up with a good API here for convenience, but 
mention
in the javadocs that it should only be used carefully.

> -Yonik
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>    


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message