lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohit Banga <iamrohitba...@gmail.com>
Subject Re: Question about Payloads in Lucene 4.5
Date Sat, 22 Mar 2014 02:38:55 GMT
​Just saw the implementation of MultiDocValues.getNumericValues(). It uses
sort of returns an anonymous inner classes to get the doc value from the
appropriate index reader. Very cool impleentation!
I guess that answers my question on how to get docVal from multiple​
​ atomic readers.

It would be nice if you could help me with the other two questions though.

Thanks
Rohit Banga
http://iamrohitbanga.com/


On Fri, Mar 21, 2014 at 7:25 PM, Rohit Banga <iamrohitbanga@gmail.com>wrote:

> ​Thanks Michael for your response.
>
> Few questions:
>
> 1. Can I expect better performance when retrieving a single
> NumericDocValue for all hits vs when I retrieve documents for all hits to
> fetch the field value? As far as I understand retrieving n documents from
> the index requires n disk reads. How many disk reads to I do when using
> NumericDocValues? How are they stored?
>
> 2. I tried looking for examples on how to use numeric doc values. I found
> that in new versions of lucene we have to use "AtomicReader".
> Found this: http://www.gossamer-threads.com/lists/lucene/java-user/182641
>
> So is this the code I am looking for:
> long getNumericDocValueForDocument(IndexSearcher searcher, int docId) {
>      IndexReader reader = searcher.getIndexReader();
>      long docVal = 0;
>      for (AtomicReaderContext rc : reader.leaves()) {
>         AtomicReader ar = rc.reader();
>         docVal = ar.getNumericDocValues().get(*docID*);
>      }
>      return docVal;
> }
>
> How do I know which docVal to return? It appears that each AtomicReader
> (every iteration of the loop) may return a docVal?
>
> 3. Can I only store NumericDocValues? Can I get something like
> StringDocValues? I have a string "id". I guess I could keep a mapping from
> numeric doc value (Long) to String but I want to avoid keeping two sources
> of information (Lucene Index and a HashMap). I can use SearcherManager to
> deal with concurrent searches and index updates (
> http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html),
> but how about managing two data sources Lucene index and HashMap<Long,
> String> with SearcherManager? Is there a way to achieve this using a custom
> SearcherFactory?
>
>
> Thanks
> Rohit Banga
> http://iamrohitbanga.com/
>
>
> On Fri, Mar 21, 2014 at 3:26 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> DocValues are better than payloads.
>>
>> E.g. index a NumericDocValuesField with each doc, holding your id.
>>
>> Then at search time you can use MultiDocValues.getNumericValues.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Fri, Mar 21, 2014 at 4:35 PM, Rohit Banga <iamrohitbanga@gmail.com>
>> wrote:
>> > Hi everyone
>> >
>> > When I query a lucene index, I get back a list of document ids. This
>> index
>> > search is fast. Now for all documents matching the result I need a
>> unique
>> > String field called "id" which is stored in the document. From the
>> > documentation I gather that document ids are internal and I should not
>> use
>> > them for referencing my own data structures. Currently I iterate over
>> all
>> > the hits matching the document and then for each one I get the document
>> to
>> > read the field using IndexReader.document().
>> >
>> http://lucene.apache.org/core/4_5_0/core/org/apache/lucene/index/IndexReader.html
>> >
>> > I read the "id" field from the document and then use it further in my
>> > processing logic.
>> > The problem is that reading all documents to get all "id"'s is turning
>> out
>> > to be very slow. It is the bottleneck in my application. It would be
>> nice
>> > to have a way if lucene could return some metadata along with the
>> internal
>> > document id when I did a search. I do not want to read all documents
>> just
>> > to retrieve this metadata.
>> >
>> > The best solution I have come across searching on the net is to use
>> > payloads which will be returned by the fast index search query along
>> with
>> > the document ids.
>> >
>> > Is my understanding correct that using payloads I can get "id" string
>> field
>> > for all my documents faster than reading my entire document?
>> >
>> > I am not able to find a good example of how to store and retrieve
>> payloads?
>> > Can you please point me to a good resource to learn how to use payloads
>> and
>> > how they will impact performance?
>> > I am using Lucene 4.5.
>> >
>> > Thanks
>> > Rohit Banga
>> > http://iamrohitbanga.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message