lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: Payloads and TrieRangeQuery
Date Wed, 10 Jun 2009 21:07:12 GMT
> On Wed, Jun 10, 2009 at 3:43 PM, Michael McCandless
> <> wrote:
> > On Wed, Jun 10, 2009 at 3:19 PM, Yonik
> Seeley<> wrote:
> >
> >>> And this information about the trie
> >>> structure and where payloads are should be stored in FieldInfos.
> >>
> >> As is the case today, the info is encoded in the class you use (and
> >> it's settings)... no need to add it to the index structure.  In any
> >> case, it's a completely different issue and shouldn't be tied to
> >> TrieRange improvements.
> >
> > The problem is, because the details of Trie* at index time affect
> > what's in each segment, this information needs to be stored per
> > segment.
> That's the case with the analysis for every field.  If you change your
> analyzer in a non-compatible fashion, you need to re-index.

I agree with Mike to store information like the data type in the index, but
on the other hand, Yonik is correct, too. If I change my analyzer (and
TrieTokenStream is in fact one, an analyzer that creates tokens out of a
number), I have to reindex.

The problem with storing different indexing settings (precisionStep,
payload/position bits) per segment makes merging nearly impossible, so I
would not do this (see also Earwins comment about that).

About releasing 2.9:
I would really like to leave this optimization out for 2.9. We can still add
this after 2.9 as an optimization. The number of bits encoded into the
TermPosition (this is really a cool idea, thanks Yonik, I was missing
exactly that, because you do not need to convert the bits, you can directly
put them into the index as int and use them on the query side!) is simply 0
for indexes created with 2.9. With later versions, you could also shift the
lower bits into the TermPosition and tell TrieRange to filter them.

I would like to go forward with moving the classes into the right packages
and optimize the way, how queries and analyzers are created (only one class
for each). The idea from LUCENE-1673 to use static factories to create these
classes for the different data types seems to be more elegant and simplier
to maintain than the current way (having a class for each bit size).

So I think I will start with 1673 and try to present something useable, soon
(but without payloads, so the payload/position-bits setting is "0").
Now the oen question: Which name for the numeric range queries/fields? :-(


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message