lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohit Banga <>
Subject Re: Question about Payloads in Lucene 4.5
Date Sat, 22 Mar 2014 02:25:30 GMT
‚ÄčThanks Michael for your response.

Few questions:

1. Can I expect better performance when retrieving a single NumericDocValue
for all hits vs when I retrieve documents for all hits to fetch the field
value? As far as I understand retrieving n documents from the index
requires n disk reads. How many disk reads to I do when using
NumericDocValues? How are they stored?

2. I tried looking for examples on how to use numeric doc values. I found
that in new versions of lucene we have to use "AtomicReader".
Found this:

So is this the code I am looking for:
long getNumericDocValueForDocument(IndexSearcher searcher, int docId) {
     IndexReader reader = searcher.getIndexReader();
     long docVal = 0;
     for (AtomicReaderContext rc : reader.leaves()) {
        AtomicReader ar = rc.reader();
        docVal = ar.getNumericDocValues().get(*docID*);
     return docVal;

How do I know which docVal to return? It appears that each AtomicReader
(every iteration of the loop) may return a docVal?

3. Can I only store NumericDocValues? Can I get something like
StringDocValues? I have a string "id". I guess I could keep a mapping from
numeric doc value (Long) to String but I want to avoid keeping two sources
of information (Lucene Index and a HashMap). I can use SearcherManager to
deal with concurrent searches and index updates (,
but how about managing two data sources Lucene index and HashMap<Long,
String> with SearcherManager? Is there a way to achieve this using a custom

Rohit Banga

On Fri, Mar 21, 2014 at 3:26 PM, Michael McCandless <> wrote:

> DocValues are better than payloads.
> E.g. index a NumericDocValuesField with each doc, holding your id.
> Then at search time you can use MultiDocValues.getNumericValues.
> Mike McCandless
> On Fri, Mar 21, 2014 at 4:35 PM, Rohit Banga <>
> wrote:
> > Hi everyone
> >
> > When I query a lucene index, I get back a list of document ids. This
> index
> > search is fast. Now for all documents matching the result I need a unique
> > String field called "id" which is stored in the document. From the
> > documentation I gather that document ids are internal and I should not
> use
> > them for referencing my own data structures. Currently I iterate over all
> > the hits matching the document and then for each one I get the document
> to
> > read the field using IndexReader.document().
> >
> >
> > I read the "id" field from the document and then use it further in my
> > processing logic.
> > The problem is that reading all documents to get all "id"'s is turning
> out
> > to be very slow. It is the bottleneck in my application. It would be nice
> > to have a way if lucene could return some metadata along with the
> internal
> > document id when I did a search. I do not want to read all documents just
> > to retrieve this metadata.
> >
> > The best solution I have come across searching on the net is to use
> > payloads which will be returned by the fast index search query along with
> > the document ids.
> >
> > Is my understanding correct that using payloads I can get "id" string
> field
> > for all my documents faster than reading my entire document?
> >
> > I am not able to find a good example of how to store and retrieve
> payloads?
> > Can you please point me to a good resource to learn how to use payloads
> and
> > how they will impact performance?
> > I am using Lucene 4.5.
> >
> > Thanks
> > Rohit Banga
> >
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message