lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: New Token API was Re: Payloads and TrieRangeQuery
Date Mon, 15 Jun 2009 19:38:52 GMT
> On Mon, Jun 15, 2009 at 3:00 PM, Uwe Schindler<uwe@thetaphi.de> wrote:
> > There is a new Attribute called ShiftAttribute (or
> NumericShiftAttribute),
> > when trie range is moved to core. This attribute contains the shifted-
> away
> > bits from the prefix encoded value during trie indexing.
> 
> I was wondering about this....
> To make use of ShiftAttribute, you need to understand the trie
> encoding scheme itself.  If you understood that, you'd be able to look
> at the actual token value if you were interested in what shift was
> used.  So it's redundant, has a runtime cost, it's not currently used
> anywhere, and it's not useful to fields other than Trie.  Perhaps it
> shouldn't exist (yet)?

The idea was to make the indexing process controllable. You were the one,
who asked e.g. for the possibility to add payloads to trie fields and so on.
Using the shift attribute, you have full control of the token types. OK,
it's a little bit redundant; you could also use the TypeAttribute (which is
already used to mark highest precision and lower precision values).

One question about the whole TokenStream: In the original case we discussed
about Payloads/Position and TrieRange. If this would be implemented in
future versions, the question is, how should I set the
PositionIncrement/Offsets in the token stream to create a Position of 0 in
the index. I do not understand the indexing process here, especially this
deprecated boolean flag about something negative (not sure what the name
was). Should I set PositionIncrement to 0 for all Trie fields per default.
How about PositionIncrementGap, when indexing more than one field? All not
really clear. The position would be simplier to implement, but doing this
with an attribute, that is indexes together with the other attributes like a
payload would be the most ideal solution for future versions of TrieRange.

(Maybe we could also use the Offset attribute for the highest precision
bits)

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message