lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <jason.rutherg...@gmail.com>
Subject Re: [jira] Commented: (LUCENE-1673) Move TrieRange to core
Date Tue, 09 Jun 2009 18:48:06 GMT
> I wonder if we could handle this by adding a setting in FieldInfo?

Do we have an issue open that allows any metadata on a per field basis?
This seems like something flexible indexing will require?

On Tue, Jun 9, 2009 at 10:15 AM, Michael McCandless (JIRA)
<jira@apache.org>wrote:

>
>    [
> https://issues.apache.org/jira/browse/LUCENE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717754#action_12717754]
>
> Michael McCandless commented on LUCENE-1673:
> --------------------------------------------
>
> {quote}
> In Solr there are three different impls:
>
> Trie (of course)
> Text-only numbers (do not work with range queries, but can be used for
> sorting etc.)
> A binary encoding (also used by LocalLucene at the moment), that is
> sortable. This can be used for RangeQueries, but sorting is slow (because
> they have no parser, and at the time it was implemented, SortField had no
> parser support)
> {quote}
>
> Ahh OK, this is just Solr's pre-existing numeric field support.  (I
> had thought you meant Solr had a different impl for Trie).
>
> bq. The problem, because of backwards compatibility they need to be
> preserved (possibility to read old indexes).
>
> This is indeed quite a challenge.  Actually is there anything in Trie
> that encodes which version of the format is indexed in a given
> segment?  (So that if we do every change the indexed format, we can
> bump a version somewhere to keep back compat).
>
> bq. Maybe we use a static factory instead of same Ctor. By this the name is
> different, but it just creates the correct instance of always the same
> class: NumericRangeQuery.newFloatRange(Float a, Float b, precisionStep) and
> so on. Same for the TokenStreams (and the Field?)
>
> That sounds like a good approach?
>
> {quote}
> > When you want to sort, pass the TrieUtils.FIELD_CACHE_LONG_PARSER
> > to your SortField
>
> Or add new SortField types.
>
> The problem with all this: For old indexes, we need some backwards
> compatibility. Ideally we would just create numeric fields in the new way
> and reuse e.g. SortField.INT for this. But this cannot be done. Or even,
> replace the FieldCache parsers by the trie ones. But this cannot be done at
> the moment.
> {quote}
>
> I wonder if we could handle this by adding a setting in FieldInfo?
> Ie, to record that "this numeric field was indexed as a trie".  Then,
> when we need to get the parser for SortField.INT, we'd check the
> FieldInfo to see which parser to use.  This could also handle
> back-compat, ie if we change the trie format being written we'd change
> the setting and segment merging would gradually uprade previously
> indexed fields.
>
> {quote}
> > I'd also like to rename RangeQuery to something else, with this
> > change. EG TermRangeQuery... to emphasize that you use it for
> > non-numbers. The javadocs of TermRangeQuery should point to
> > Int/LongRangeQuery as strongly preferred for numeric ranges.
>
> Cool. For the others, too (FieldCacheRangeQuery).
> {quote}
>
> Yes.
>
>
> > Move TrieRange to core
> > ----------------------
> >
> >                 Key: LUCENE-1673
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1673
> >             Project: Lucene - Java
> >          Issue Type: New Feature
> >          Components: Search
> >    Affects Versions: 2.9
> >            Reporter: Uwe Schindler
> >            Assignee: Uwe Schindler
> >             Fix For: 2.9
> >
> >
> > TrieRange was iterated many times and seems stable now (LUCENE-1470,
> LUCENE-1582, LUCENE-1602). There is lots of user interest, Solr added it to
> its default FieldTypes (SOLR-940) and if possible I want to move it to core
> before release of 2.9.
> > Before this can be done, there are some things to think about:
> > # There are now classes called LongTrieRangeQuery, IntTrieRangeQuery, how
> should they be called in core? I would suggest to leave it as it is. On the
> other hand, if this keeps our only numeric query implementation, we could
> call it LongRangeQuery, IntRangeQuery or NumericRangeQuery (see below, here
> are problems). Same for the TokenStreams and Filters.
> > # Maybe the pairs of classes for indexing and searching should be moved
> into one class: NumericTokenStream, NumericRangeQuery, NumericRangeFilter.
> The problem here: ctors must be able to pass int, long, double, float as
> range parameters. For the end user, mixing these 4 types in one class is
> hard to handle. If somebody forgets to add a L to a long, it suddenly
> instantiates a int version of range query, hitting no results and so on.
> Same with other types. Maybe accept java.lang.Number as parameter (because
> nullable for half-open bounds) and one enum for the type.
> > # TrieUtils move into o.a.l.util? or document or?
> > # Move TokenStreams into o.a.l.analysis, ShiftAttribute into
> o.a.l.analysis.tokenattributes? Somewhere else?
> > # If we rename the classes, should Solr stay with Trie (because there are
> different impls)?
> > # Maybe add a subclass of AbstractField, that automatically creates these
> TokenStreams and omits norms/tf per default for easier addition to Document
> instances?
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
View raw message