lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject Move TrieRange to Core/Module and integration issues
Date Mon, 13 Apr 2009 15:40:53 GMT
Hi,

it was discussed now many times on this list, but I did not get a solution,
if we should include TrieRange into the core or not.

When thinking about it and looking in the latest developments about
TrieRange (TokenStreams for indexing), I plan to do the following:

a) Put the classes into the correct packages:
- (Int|Long)TrieRangeFilter into o.a.l.search, with maybe new name
(Int|Long)NumericRangeFilter or possible both in one class
NumericRangeFilter (this is possible, problem is only that you will have 2
ctors taking long or int and are else identical, and auto-casting in the
compiler can do bad things...)
- (Int|Long)TrieTokenStream into o.a.l.analysis as NumericTokenStream (same
note as above)
- ShiftAttribute into o.a.l.analysis.tokenattributes
- TrieUtils as new NumberUtils in a not yet known place: o.a.l.utils?
o.a.l.document?
- The TrieValueSource for LUCENE-831 would move to o.a.l.search (see patch
there)

b) Make NumericRangeQuery (not yet existing as own class) a subclass of the
new MultiTermQuery. By this you get the ConstantScore, BooleanQuery rewrite
and the Filter for free. To enable this, I must create a
Numeric/TrieRangeTermEnum, here I propose some changes:

MultiTermQuery has in its protected getEnum() returning FilteredTermEnum.
For TrieRange, the return should be changed to TermEnum, it is not needed to
have a FilteredTermEnum (FilteredTermEnum is only an implementation, the
method should return an abstract TermEnum). If this is fixed, I can write a
NumericRangeTermEnum extends TermEnum, that enumerates the terms for all
sub-ranges (with FilteredTermEnum this is not possible), so it must be a
"own" extension. FilteredTermEnum could be used if it would be possible to
access the inner enum and term members (currently private), but this would
be a completely "unclean hack". The NumericRangeTermEnum would get the range
bounds by the TrieUtils.RangeBuilder and in its next() method return the
terms with skipping to the correct terms on range change automatically
(until TermEnum.skipTo() works performant by using a new one from the
IndexReader).


Any thoughts? How to proceed with TrieRange?

Something other: How about storing the "type" information in FieldInfos and
invent a AbstractField subclass for numbers (NumberField) returning the
TrieTokenStream in tokenSteam()? This could help people to index. When
searching, query parsers could use the information and construct the right
queries, sorting would automatically choose the right ValueSource/Parser and
so on.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message