lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <>
Subject [jira] Commented: (LUCENE-1673) Move TrieRange to core
Date Sun, 14 Jun 2009 14:18:07 GMT


Uwe Schindler commented on LUCENE-1673:

bq. use SortField.PREFIX_ENCODED_INT for the trie ones

This needlessly couples the Trie stuff strongly to the SortField stuff. Something along the
lines of the current TrieUtils.getIntSortField(fname, reverse) seems preferable.

bq. add TrieUtils.XxxParser to FieldCache (but accessible)

The Trie parsers belong in the Trie class.

bq. re-use INT (and so on) in Sort and cache code, where the data type is meant (we already
have this in lots of code around), but where the encoding is meant use PLAIN_TEXT_* vs. PREFIX_ENCODED_*.

I didn't understand that sentence.

But on the other hand SortField.INT is also strongly linked to the plain text encoding of
these tokens. My proposal was to unlink the index encoding of numeric data types from the
sorting/field cache code and its constants. So it should not make a difference if you encoded
the long using Integer.toString() or TrieUtils, in both cases, sorting code is identical,
only the parser is different.

Because of this, if we stay with SortField.INT and so on, I would tend to make the according
Parser/FieldCache a required arg of SortField, defaulting to the current parsers for the deprecated

So one using new code must always specify the parser when using SortField.INT (SortField.AUTO
is already deprectaed so no problem). The same with FieldCache: always specify the parser
when getting an instance. For that the current default parsers should be made public accessible.

bq. As far as what package it makes sense to go in... what about an analysis.numeric package

TrieUtils is used in analysis and searching, this is why I tend to util. The NumericTokenStream
is in analysis (in my not-yet-realeased patch), ShiftAttribute in analysis.tokenattributes
and TrieRangeQuery/Filter in search.

bq. As a general comment, moving TrieRange to core should be moving it to the core and perhaps
renaming the classes if we can think of a better name. Some of the other stuff belongs in
a different issue.

I think this is correct. I will post a patch soon, that leaves TrieUtils alive.

> Move TrieRange to core
> ----------------------
>                 Key: LUCENE-1673
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
> TrieRange was iterated many times and seems stable now (LUCENE-1470, LUCENE-1582, LUCENE-1602).
There is lots of user interest, Solr added it to its default FieldTypes (SOLR-940) and if
possible I want to move it to core before release of 2.9.
> Before this can be done, there are some things to think about:
> # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how should they
be called in core? I would suggest to leave it as it is. On the other hand, if this keeps
our only numeric query implementation, we could call it LongRangeQuery, IntRangeQuery or NumericRangeQuery
(see below, here are problems). Same for the TokenStreams and Filters.
> # Maybe the pairs of classes for indexing and searching should be moved into one class:
NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The problem here: ctors must be
able to pass int, long, double, float as range parameters. For the end user, mixing these
4 types in one class is hard to handle. If somebody forgets to add a L to a long, it suddenly
instantiates a int version of range query, hitting no results and so on. Same with other types.
Maybe accept java.lang.Number as parameter (because nullable for half-open bounds) and one
enum for the type.
> # TrieUtils move into o.a.l.util? or document or?
> # Move TokenStreams into o.a.l.analysis, ShiftAttribute into o.a.l.analysis.tokenattributes?
Somewhere else?
> # If we rename the classes, should Solr stay with Trie (because there are different impls)?
> # Maybe add a subclass of AbstractField, that automatically creates these TokenStreams
and omits norms/tf per default for easier addition to Document instances?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message