lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Idzikowski <piotridzikow...@gmail.com>
Subject Re: Lucene DocValuesField, SortedDocValuesField usage for filtering and sorting
Date Tue, 16 Dec 2014 14:25:41 GMT
Hello.
Thanks for your replay.

On Tue, Dec 16, 2014 at 3:14 PM, Adrien Grand <jpountz@gmail.com> wrote:
>
> Hi Piotr,
>
> On Mon, Dec 15, 2014 at 9:43 PM, Piotr Idzikowski
> <piotridzikowski@gmail.com> wrote:
> > Hello.
> > I am going to switch to newest (4.10.2) version of Lucene and I'd like to
> > make some optimization in my index and code. I would like to use
> > DocValuesField to get values but also for filtering and sorting. So here
> I
> > have some questions: If I'd like to use range filter
> > (FieldCacheRangeFilter) I need to store a value in XxxDocValuesField, but
> > if i want to use terms filter (FieldCacheTermsFilter) I need to store a
> > value in SortedDocValuesField. So it looks like if I want to use range
> and
> > terms filters I need to have two different fields. Am I right? Am I using
> > it correctly?
>
> FieldCacheRangeFilter and FieldCacheTermsFilter only work well when
> you have lots of terms and most documents match your filter. Otherwise
> you should consider using the regular numeric range filter and terms
> filter. Although they might be a bit slower in the dense case, they
> will be significantly faster when few terms/documents match.
>
So for instance if I store documents with ie creation date and I have a
data (millions of documents) from last let's say 3 years and I'd like to do
range filter to get socs from some month only is it better to use ordinary
numeric query instead of FieldCacheRangeQuery?


>
> Both FieldCacheRangeFilter and FieldCacheTermsFilter would work on the
> same SortedDocValues field. What makes you think you need two fields ?
>
Code:
FieldCacheRangeFilter

*public static FieldCacheRangeFilter<Long> newLongRange(String field,
FieldCache.LongParser parser, Long lowerVal, Long upperVal, boolean
includeLower, boolean includeUpper) {*
*    return new FieldCacheRangeFilter<Long>(field, parser, lowerVal,
upperVal, includeLower, includeUpper) {*
*      @Override*
*      public DocIdSet getDocIdSet(AtomicReaderContext context, Bits
acceptDocs) throws IOException {*
*        final long inclusiveLowerPoint, inclusiveUpperPoint;*
*        if (lowerVal != null) {*
*          long i = lowerVal.longValue();*
*          if (!includeLower && i == Long.MAX_VALUE)*
*            return null;*
*          inclusiveLowerPoint = includeLower ? i : (i + 1L);*
*        } else {*
*          inclusiveLowerPoint = Long.MIN_VALUE;*
*        }*
*        if (upperVal != null) {*
*          long i = upperVal.longValue();*
*          if (!includeUpper && i == Long.MIN_VALUE)*
*            return null;*
*          inclusiveUpperPoint = includeUpper ? i : (i - 1L);*
*        } else {*
*          inclusiveUpperPoint = Long.MAX_VALUE;*
*        }*

*        if (inclusiveLowerPoint > inclusiveUpperPoint)*
*          return null;*

*        final FieldCache.Longs values =
FieldCache.DEFAULT.getLongs(context.reader(), field,
(FieldCache.LongParser) parser, false);*
*        return new FieldCacheDocIdSet(context.reader().maxDoc(),
acceptDocs) {*
*          @Override*
*          protected boolean matchDoc(int doc) {*
*            final long value = values.get(doc);*
*            return value >= inclusiveLowerPoint && value <=
inclusiveUpperPoint;*
*          }*
*        };*
*      }*
*    };*
*  }*

FieldCacheTermsFilter:

 *@Override*
*  public DocIdSet getDocIdSet(AtomicReaderContext context, Bits
acceptDocs) throws IOException {*
*    final SortedDocValues fcsi =
getFieldCache().getTermsIndex(context.reader(), field);*
*    final FixedBitSet bits = new FixedBitSet(fcsi.getValueCount());*
*    for (int i=0;i<terms.length;i++) {*
*      int ord = fcsi.lookupTerm(terms[i]);*
*      if (ord >= 0) {*
*        bits.set(ord);*
*      }*
*    }*



Regards
Piotr

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message