lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hasenberger, Josef" <Josef.Hasenber...@zetcom.com>
Subject What is the propper replacement for Filters working in DocValue fields?
Date Wed, 23 Mar 2016 09:24:51 GMT
Hello,

I am migrating a rather large application from Lucene 4.10 to Lucene 5.5.0.
Since Filters are deprecated in Lucene 5, I am looking for an efficient replacement in our
code.

We use many Filters that calculate the DocIdSet by doing a lookup of numeric DocValues in
some collection.
Everything is based on "long" types and results could be large.
Pseudo code in Filter class looks like this:

    @Override
    public DocIdSet getDocIdSet(AtomicReaderContext context, Bits acceptDocs) throws IOException
{
        AtomicReader reader = context.reader();
        OpenBitSet docSet = new OpenBitSet();
        NumericDocValues docValues = reader.getNumericDocValues(filterKeyName);

        for (int doc = 0; doc < reader.maxDoc(); doc++) {
            long value = docValues.get(doc); // getting DocValues for current doc
            if (isMatch(value)) { // check value against some condition
                docSet.set(doc); // set bit for doc
            }
        }
        return docSet;
    }


I wonder what the proper and efficient replacement for such filtering is?

Should I convert my matching value set into a TermsQuery and wrap with ConstantScoreQuery?
I could do this, but then I am concerned about:

*         Efficiency:
The matching document in the isMatch() method above could be very large. I would need to create
large collection of Terms rather than the memory efficient DocIdSet.


*         More efficiency:
>From my current understanding, I would need to create a Term from the String representation
of my long value. Isn't this inefficient again?

I would really appreciate any recommendations on this.

Thanks a lot and best regards,
Josef


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message