lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Busch <>
Subject Re: attribute thoughts
Date Wed, 19 Aug 2009 00:01:51 GMT
On 8/14/09 9:23 AM, Yonik Seeley wrote:
> On Thu, Aug 13, 2009 at 4:32 PM, Michael Busch<>  wrote:
>> On 8/13/09 7:29 AM, Yonik Seeley wrote:
>>> I'm liking the new attribute based analysis (in conjunction with
>>> reusability), but I'm running into some questions...
>>> Is it valid for tokenizers or token filters add new attributes after
>>> their constructor (after they have processed some tokens)?
>> At the moment we're saying in the javadocs of TokenStream that all
>> Attributes should be
>> added up front.
> Hmmm, OK... in which case, token producers using restoreState() would
> not have to call clearAttributes() first.
>> We could change these semantics. I had some thoughts about
>> it in the original
>> JIRA issue (LUCENE-1422).
> Apologies if I'm rehashing anything - it's hard to keep up with some
> of those monster (high volume) issues.
>> So back to your question if we should allow restoreState() to add attributes
>> and use a state across different AttributeSources: the complication is that we can
>> allow that if  the different AttributeSource were filled using the same AttributeFactory,
>> different AtttributeImpls could be in the sources and the copying wouldn't
>> work anymore.
> Hmmm, so perhaps just an assertion that the factories are equal... and
> documentation saying that moving state from one stream to the other
> requires identical factories?  Anyway, I don't currently have a use
> case for this... I was just wondering.

Yes that should work. We basically have such an assertion in 
   public void addSinkTokenStream(final SinkTokenStream sink) {
     // check that sink has correct factory
     if (!this.getAttributeFactory().equals(sink.getAttributeFactory())) {
       throw new IllegalArgumentException("The supplied sink is not 
compatible to this tee");

So I agree, we should just do the same in restoreState().
> Another thing I was wondering about was the opacity of State - one
> can't inspect or change the attributes w/o restoring it first.
> Undesirable limitation, or feature allowing more flexible state
> implementations?

Excellent point! This limitation is currently there to discourage 
changing values of
a state, because that would be rather inefficient: you'd have to lookup 
the attribute(s)
of each state you want to change. We could write a StateContainer, which 
has an API
to access states in an efficient way (iterator, random access), using 

When I changed the contrib TokenStreams this limitation was somewhat 
for some streams - but in all cases it was possible to implement the 
streams far more
efficient by avoiding excessive caching. (except ShingleMatrixFilter, I 
gave up eventually,
not knowing that code at all).

So I agree we should come up with a good API here for convenience, but 
in the javadocs that it should only be used carefully.

> -Yonik
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message