lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohit Banga <iamrohitba...@gmail.com>
Subject Re: Question about Payloads in Lucene 4.5
Date Sat, 22 Mar 2014 02:25:30 GMT
‚ÄčThanks Michael for your response.

Few questions:

1. Can I expect better performance when retrieving a single NumericDocValue
for all hits vs when I retrieve documents for all hits to fetch the field
value? As far as I understand retrieving n documents from the index
requires n disk reads. How many disk reads to I do when using
NumericDocValues? How are they stored?

2. I tried looking for examples on how to use numeric doc values. I found
that in new versions of lucene we have to use "AtomicReader".
Found this: http://www.gossamer-threads.com/lists/lucene/java-user/182641

So is this the code I am looking for:
long getNumericDocValueForDocument(IndexSearcher searcher, int docId) {
     IndexReader reader = searcher.getIndexReader();
     long docVal = 0;
     for (AtomicReaderContext rc : reader.leaves()) {
        AtomicReader ar = rc.reader();
        docVal = ar.getNumericDocValues().get(*docID*);
     }
     return docVal;
}

How do I know which docVal to return? It appears that each AtomicReader
(every iteration of the loop) may return a docVal?

3. Can I only store NumericDocValues? Can I get something like
StringDocValues? I have a string "id". I guess I could keep a mapping from
numeric doc value (Long) to String but I want to avoid keeping two sources
of information (Lucene Index and a HashMap). I can use SearcherManager to
deal with concurrent searches and index updates (
http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html),
but how about managing two data sources Lucene index and HashMap<Long,
String> with SearcherManager? Is there a way to achieve this using a custom
SearcherFactory?


Thanks
Rohit Banga
http://iamrohitbanga.com/


On Fri, Mar 21, 2014 at 3:26 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> DocValues are better than payloads.
>
> E.g. index a NumericDocValuesField with each doc, holding your id.
>
> Then at search time you can use MultiDocValues.getNumericValues.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Mar 21, 2014 at 4:35 PM, Rohit Banga <iamrohitbanga@gmail.com>
> wrote:
> > Hi everyone
> >
> > When I query a lucene index, I get back a list of document ids. This
> index
> > search is fast. Now for all documents matching the result I need a unique
> > String field called "id" which is stored in the document. From the
> > documentation I gather that document ids are internal and I should not
> use
> > them for referencing my own data structures. Currently I iterate over all
> > the hits matching the document and then for each one I get the document
> to
> > read the field using IndexReader.document().
> >
> http://lucene.apache.org/core/4_5_0/core/org/apache/lucene/index/IndexReader.html
> >
> > I read the "id" field from the document and then use it further in my
> > processing logic.
> > The problem is that reading all documents to get all "id"'s is turning
> out
> > to be very slow. It is the bottleneck in my application. It would be nice
> > to have a way if lucene could return some metadata along with the
> internal
> > document id when I did a search. I do not want to read all documents just
> > to retrieve this metadata.
> >
> > The best solution I have come across searching on the net is to use
> > payloads which will be returned by the fast index search query along with
> > the document ids.
> >
> > Is my understanding correct that using payloads I can get "id" string
> field
> > for all my documents faster than reading my entire document?
> >
> > I am not able to find a good example of how to store and retrieve
> payloads?
> > Can you please point me to a good resource to learn how to use payloads
> and
> > how they will impact performance?
> > I am using Lucene 4.5.
> >
> > Thanks
> > Rohit Banga
> > http://iamrohitbanga.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message