lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <>
Subject [jira] Commented: (LUCENE-1673) Move TrieRange to core
Date Mon, 15 Jun 2009 19:23:07 GMT


Uwe Schindler commented on LUCENE-1673:

bq. This will apply to int/long/float/double as well right? How would you do this (require
a parser for only numeric sorts) back-compatibly? EG, the others (String, DOC, etc.) don't
require a parser.

Mike: This will apply to int/long/float/double as well right? How would you
do this (require a parser for only numeric sorts) back-compatibly? EG,
the others (String, DOC, etc.) don't require a parser.

Yonik: Allow passing parser==null to get the default?

bq. We could alternatively make NumericSortField (subclassing SortField), that just uses the
right parser?

A factory method TrieUtils.getSortField() could also return the right SortField.

I want to move this into a new issue after, I will open one.

Nevertheless, I would like to remove emphasis from NumericUtils (which is in realyity a helper
class). So I want to make the current human-readable numeric parsers public and also add the
trie parsers to FieldCache.

The SortField factory is then the only parts really needed in NumericUtils, but not really.
The parser is a singleton, works for all trie fields and could also live somewhere else or
nowhere at all, if the Parsers all stay in FieldCache.

bq. Should we support byte/short for trie indexed fields as well? (Since SortField, FieldCache
support these numeric types too...). 

For bytes, TrieRange is not very interesting, for shorts, maybe, but I would subsume them
during indexing as simple integers. You could not speedup searching, but limit index size
a little bit.

bq. Could we change ShiftAttribute -> NumericShiftAttribute?

No problem, I do this. There is also missing the link from the TokenStream in the javadocs
to this, see also my reply in java-dev to Grants mail.

bq. Can we rename RangeQuery -> TextRangeQuery (TermRangeQuery), to make it clear that
its range checking is by Term sort order.

We can do this and deprecate the old one, but I added a note to Javadocs (see patch). I would
do this outside of this issue.

bq. How about oal.util.NumericUtils instead of TrieUtils?

That was my first idea, too. What to do with o.a.l.doc.NumberTools (deprecate?). And also
update contrib/spatial to use NumericUtils instead of the copied and not really goo NumberUtils
from Solr (Yonik said, it was written at a very early stage, and is not effective with UTF-8
encoding and the TermEnum posioning with the term prefixes). It would be a index-format change
for spatial, but as the code was not yet released (in Lucene), the Lucene version should not
use NumberUtils at all.

> Move TrieRange to core
> ----------------------
>                 Key: LUCENE-1673
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>         Attachments: LUCENE-1673.patch, LUCENE-1673.patch
> TrieRange was iterated many times and seems stable now (LUCENE-1470, LUCENE-1582, LUCENE-1602).
There is lots of user interest, Solr added it to its default FieldTypes (SOLR-940) and if
possible I want to move it to core before release of 2.9.
> Before this can be done, there are some things to think about:
> # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how should they
be called in core? I would suggest to leave it as it is. On the other hand, if this keeps
our only numeric query implementation, we could call it LongRangeQuery, IntRangeQuery or NumericRangeQuery
(see below, here are problems). Same for the TokenStreams and Filters.
> # Maybe the pairs of classes for indexing and searching should be moved into one class:
NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The problem here: ctors must be
able to pass int, long, double, float as range parameters. For the end user, mixing these
4 types in one class is hard to handle. If somebody forgets to add a L to a long, it suddenly
instantiates a int version of range query, hitting no results and so on. Same with other types.
Maybe accept java.lang.Number as parameter (because nullable for half-open bounds) and one
enum for the type.
> # TrieUtils move into o.a.l.util? or document or?
> # Move TokenStreams into o.a.l.analysis, ShiftAttribute into o.a.l.analysis.tokenattributes?
Somewhere else?
> # If we rename the classes, should Solr stay with Trie (because there are different impls)?
> # Maybe add a subclass of AbstractField, that automatically creates these TokenStreams
and omits norms/tf per default for easier addition to Document instances?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message