lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (LUCENE-1673) Move TrieRange to core
Date Sun, 14 Jun 2009 11:04:07 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719246#action_12719246
] 

Uwe Schindler edited comment on LUCENE-1673 at 6/14/09 4:03 AM:
----------------------------------------------------------------

Here my own thoughts:

bq. But the latter can be refactored, to SortField.TRIE_XXX (not good name, as TRIE no longer
used) and the parser instances can be added to FieldCache.

>From the API of FieldCache and sorting, in my opinion, it was always not a good idea,
to link the encoding in index, to the impl everywhere.

- deprecate SortField.INT and use SortField.PLAIN_TEXT_INT instead and so on
- use SortField.PREFIX_ENCODED_INT for the trie ones (better name, this is the internal encoding
name from TrieUtils)
- the default parsers (private) in FieldCache renaming to also PlainText* (but accessible)
- add TrieUtils.XxxParser to FieldCache (but accessible)
- re-use INT  (and so on) in Sort and cache code, where the data type is meant (we already
have this in lots of code around), but where the encoding is meant use PLAIN_TEXT_ vs. PREFIX_ENCODED_
for the encoding. So we have in the Comparators we have the native type names, but in the
impl (where the underlying encoding is used)

I know these are hard changes, but we had a lot of productivity in the past here (thanks Shai,
Jason, Michael), so there are a lot of new APIs that are very much decoupled from the underlying
encoding. This would again rename a lot of internal parts. But because of deprecation, this
could be done in-line with Shai's and Michael's and Jason's changes here.

      was (Author: thetaphi):
    Here my own thoughts:

bq. But the latter can be refactored, to SortField.TRIE_XXX (not good name, as TRIE no longer
used) and the parser instances can be added to FieldCache.

- deprecate SortField.INT and use SortField.PLAIN_TEXT_INT instead and so on
- use SortField.PREFIX_ENCODED_INT for the trie ones (better name, this is the internal encoding
name from TrieUtils)
- the default parsers (private) in FieldCache renaming to also PlainText* (but accessible)
- add TrieUtils.XxxParser to FieldCache (but accessible)
  
> Move TrieRange to core
> ----------------------
>
>                 Key: LUCENE-1673
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1673
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>
> TrieRange was iterated many times and seems stable now (LUCENE-1470, LUCENE-1582, LUCENE-1602).
There is lots of user interest, Solr added it to its default FieldTypes (SOLR-940) and if
possible I want to move it to core before release of 2.9.
> Before this can be done, there are some things to think about:
> # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how should they
be called in core? I would suggest to leave it as it is. On the other hand, if this keeps
our only numeric query implementation, we could call it LongRangeQuery, IntRangeQuery or NumericRangeQuery
(see below, here are problems). Same for the TokenStreams and Filters.
> # Maybe the pairs of classes for indexing and searching should be moved into one class:
NumericTokenStream, NumericRangeQuery, NumericRangeFilter. The problem here: ctors must be
able to pass int, long, double, float as range parameters. For the end user, mixing these
4 types in one class is hard to handle. If somebody forgets to add a L to a long, it suddenly
instantiates a int version of range query, hitting no results and so on. Same with other types.
Maybe accept java.lang.Number as parameter (because nullable for half-open bounds) and one
enum for the type.
> # TrieUtils move into o.a.l.util? or document or?
> # Move TokenStreams into o.a.l.analysis, ShiftAttribute into o.a.l.analysis.tokenattributes?
Somewhere else?
> # If we rename the classes, should Solr stay with Trie (because there are different impls)?
> # Maybe add a subclass of AbstractField, that automatically creates these TokenStreams
and omits norms/tf per default for easier addition to Document instances?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message