lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <>
Subject [jira] Commented: (LUCENE-1673) Move TrieRange to core
Date Tue, 02 Jun 2009 13:41:07 GMT


Uwe Schindler commented on LUCENE-1673:

(Aside: I just noticed the code fragment in the javadocs for
LongTrieTokenStream won't compile, because the setValue method is not
available for TokenStream; the stream should be defined as
LongTrieTokenStream, I think?; same with IntTrieTokenStream)

I fixed this :-) Thanks!

bq. If we rename the classes, should Solr stay with Trie (because there are different impls)?

Well, Solr should decide 

But: why are there different impls for Solr?

I only added this here, to know, that Solr already started to implement this. In Solr there
are three different impls:
- Trie (of course)
- Text-only numbers (do not work with range queries, but can be used for sorting etc.)
- A binary encoding (also used by LocalLucene at the moment), that is sortable. This can be
used for RangeQueries, but sorting is slow (because they have no parser, and at the time it
was implemented, SortField had no parser support)

The problem, because of backwards compatibility they need to be preserved (possibility to
read old indexes).

bq. I think separate classes for int, long, float, double is better.

Two more... The problem, all these classes have exact the same impl internally and this is
code duplication and hard to maintain. Maybe we use a static factory instead of same Ctor.
By this the name is different, but it just creates the correct instance of always the same
class: NumericRangeQuery.newFloatRange(Float a, Float b, precisionStep) and so on. Same for
the TokenStreams (and the Field?)

Ideally, one would simply use, say, LongNumericField (subclass of
AbstractField) at indexing time, Lucene would "remember" this
in the index (this is obviously missing today), and then when you
sort, retrieve value, and create queries from QueryParser, all these
places would "know" that this is a trie field and simply do the right
thing, by default.

For that we need the type information in the index and for that the new Field/Document classes.
Hopefully Michael will get this working soonly.

When you want to sort, pass the TrieUtils.FIELD_CACHE_LONG_PARSER
to your SortField 

Or add new SortField types.

The problem with all this: For old indexes, we need some backwards compatibility. Ideally
we would just create numeric fields in the new way and reuse e.g. SortField.INT for this.
But this cannot be done. Or even, replace the FieldCache parsers by the trie ones. But this
cannot be done at the moment.

I'd also like to rename RangeQuery to something else, with this
change. EG TermRangeQuery... to emphasize that you use it for
non-numbers. The javadocs of TermRangeQuery should point to
Int/LongRangeQuery as strongly preferred for numeric ranges.

Cool. For the others, too (FieldCacheRangeQuery).

There is a lot more to decide, I will keep this issue open a little bit before starting to
work to collect ideas!

> Move TrieRange to core
> ----------------------
>                 Key: LUCENE-1673
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
> TrieRange was iterated many times and seems stable now (LUCENE-1470, LUCENE-1582, LUCENE-1602).
There is lots of user interest, Solr added it to its default FieldTypes (SOLR-940) and if
possible I want to move it to core before release of 2.9.
> Before this can be done, there are some things to think about:
> # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how should they
be called in core? I would suggest to leave it as it is. On the other hand, if this keeps
our only numeric query implementation, we could call it LongRangeQuery, IntRangeQuery or NumericRangeQuery
(see below, here are problems). Same for the TokenStreams and Filters.
> # Maybe the pairs of classes for indexing and searching should be moved into one class:
NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The problem here: ctors must be
able to pass int, long, double, float as range parameters. For the end user, mixing these
4 types in one class is hard to handle. If somebody forgets to add a L to a long, it suddenly
instantiates a int version of range query, hitting no results and so on. Same with other types.
Maybe accept java.lang.Number as parameter (because nullable for half-open bounds) and one
enum for the type.
> # TrieUtils move into o.a.l.util? or document or?
> # Move TokenStreams into o.a.l.analysis, ShiftAttribute into o.a.l.analysis.tokenattributes?
Somewhere else?
> # If we rename the classes, should Solr stay with Trie (because there are different impls)?
> # Maybe add a subclass of AbstractField, that automatically creates these TokenStreams
and omits norms/tf per default for easier addition to Document instances?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message