lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kumaran Ramasubramanian <kums....@gmail.com>
Subject Re: Sorting, Range Query, faceting - NumericDocValuesField Vs LongField
Date Fri, 23 Dec 2016 11:40:18 GMT
Thanks Erick and Mike. i am using lucene 4.10.4 directly.


i have observed better performance in LongField compared to lexicographic
sorting. i can understand, it is due to trie structure of LongField,

But one more doubt, Will uninversion process happen in IntField / LongField
too?

Thanks for the link mike. i will look into LongPoint in recent versions.

--
Kumaran R










On Fri, Dec 23, 2016 at 4:51 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Note that Erick is giving you the Solr syntax below, but if you are
> using Lucene directly, that obviously doesn't apply (though the same
> general concepts do).
>
> I would strongly recommend not using uninversion: it's an archaic and
> costly option that Lucene only offered long ago because it didn't have
> doc values, but that changed many years ago now.
>
> Also the new dimensional points (IntPoint, LongPoint) give better
> performance than the legacy postings based ("trie") numerics.
>
> See https://www.elastic.co/blog/apache-lucene-numeric-filters for some
> of the history here ...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Dec 22, 2016 at 10:37 PM, Erick Erickson
> <erickerickson@gmail.com> wrote:
> > bq: Does this mean LongField/IntField just supports lexicographic
> > order in sorting?
> >
> > no on several counts.
> >
> > No numeric type (long, int, float, double or trie values) support
> > lexicographic sorting. That's the whole _point_ of having numeric
> > types in the first place. Well, and efficient range queries in the
> > Trie variants.
> >
> > docValues are an additional _attribute_ on the field so it's perfectly
> > reasonable to have a long field that's both
> > indexed="true"  and docValues="true". Or
> > indexed="true"  and docValues="false". Or
> > indexed="false" and docValues="true". Or
> > indexed="false" and docValues="false"
> >
> > Do not think of them as separate field types.
> >
> > indexed="true" is _required_ for searching. A field with
> > indexed="true" and docValues="false" also supports faceting, grouping
> > and sorting (numeric).
> >
> > A field with docValues="true" just supports faceting, grouping and
> > sorting without having to "uninvert" the field in the Java heap, the
> > data is out in OS cache. See Uwe's excellent blog here:
> > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> >
> > Best,
> > Erick
> >
> > On Thu, Dec 22, 2016 at 6:57 PM, Kumaran Ramasubramanian
> > <kums.134@gmail.com> wrote:
> >> Thank you Adrien.
> >>
> >> "NumericDocValuesField is the one that supports sorting."
> >>
> >> Does this mean LongField/IntField just supports lexicographic order in
> >> sorting?
> >>
> >>
> >> -
> >> Kumaran R
> >>
> >>
> >>
> >> On Dec 22, 2016 11:28 PM, "Adrien Grand" <jpountz@gmail.com> wrote:
> >>
> >> Le jeu. 22 déc. 2016 à 18:50, Kumaran Ramasubramanian <
> kums.134@gmail.com>
> >> a écrit :
> >>
> >>> I want to provide sorting, range search and faceting in numeric fields.
> >>>
> >>> AFAIK, Purpose of different numeric field types are,
> >>>
> >>> NumericDocValuesField supports sorting and faceting
> >>> LongField/IntField supports range query and sorting
> >>>
> >>
> >> LongField/IntField only support querying, NumericDocValuesField is the
> one
> >> that supports sorting.
> >>
> >> Also note that as of 6.0 LongField and IntField have been replaced with
> >> LongPoint and IntPoint.
> >>
> >>
> >>> 1. Should i duplicate one field in above mentioned types to achieve all
> >> the
> >>> three features in numeric?
> >>>
> >>
> >> Yes. By the way it is perfectly fine to use the same field name for the
> >> point field and the doc values field.
> >>
> >>
> >>> 2. If i am ready to sacrifice faceting, is it advisable to use
> LongField
> >>> for sorting and range query?
> >>>
> >>
> >> Like said above you need doc values for sorting.
> >>
> >>
> >>> 3. During sorting, Will NumericDocValuesField( column stride storage)
> >>> perform better than LongField(trie structure)? If so , should i
> duplicate
> >>> field in both 1 and 2 cases?
> >>>
> >>
> >> Same note here.
> >>
> >> Adrien
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message