lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Question about Payloads in Lucene 4.5
Date Sat, 22 Mar 2014 08:28:47 GMT
On Fri, Mar 21, 2014 at 10:25 PM, Rohit Banga <iamrohitbanga@gmail.com> wrote:
> Thanks Michael for your response.

You're welcome!

> Few questions:
>
> 1. Can I expect better performance when retrieving a single NumericDocValue
> for all hits vs when I retrieve documents for all hits to fetch the field
> value? As far as I understand retrieving n documents from the index
> requires n disk reads. How many disk reads to I do when using
> NumericDocValues? How are they stored?

It should be faster; doc values are stored "column stride", where all
values across all docs for that one field are stored together, vs "row
stride" of a stored document, where all fields for each document are
stored together.

The default DV format is Lucene45DocValuesFormat; it tries to compress
the values, and then leaves the compressed form on disk and seeks for
each lookup, but often the OS will cache those pages in RAM, if your
application keeps them hot.

You should test that first; if it's still too slow, and you're willing
to use RAM, then swap in a different DVFormat for your field, e.g.
DirectDocValuesFormat is the most RAM consuming (stores native java
array under the hood) but should be the fastest.

Swapping in a custom DVFormat for a field is easy: just make your own
codec by subclassing the default Lucene46Codec, and override the
method getDocValuesFormatForField.

> 2. I tried looking for examples on how to use numeric doc values. I found
> that in new versions of lucene we have to use "AtomicReader".
> Found this: http://www.gossamer-threads.com/lists/lucene/java-user/182641
>
> So is this the code I am looking for:
> long getNumericDocValueForDocument(IndexSearcher searcher, int docId) {
>      IndexReader reader = searcher.getIndexReader();
>      long docVal = 0;
>      for (AtomicReaderContext rc : reader.leaves()) {
>         AtomicReader ar = rc.reader();
>         docVal = ar.getNumericDocValues().get(*docID*);
>      }
>      return docVal;
> }
>
> How do I know which docVal to return? It appears that each AtomicReader
> (every iteration of the loop) may return a docVal?

Looks like you solved this already ...

> 3. Can I only store NumericDocValues? Can I get something like
> StringDocValues? I have a string "id". I guess I could keep a mapping from
> numeric doc value (Long) to String but I want to avoid keeping two sources
> of information (Lucene Index and a HashMap). I can use SearcherManager to
> deal with concurrent searches and index updates (
> http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html),
> but how about managing two data sources Lucene index and HashMap<Long,
> String> with SearcherManager? Is there a way to achieve this using a custom
> SearcherFactory?

There are also binary doc values, maybe that helps?

You may also want LiveFieldValues, if you need precise (real-time)
lookup of the id for all docs, including just indexed ones.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message