Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of iamrohitbanga@gmail.com
 designates 74.125.82.43 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAG_AXeZ1CqS8g9Yk=jTSwt1OMj9BeNayYmOkzwDs1oV_PY-hnw@mail.gmail.com>
References: 
 <CAG_AXebvrRh9+nE7r1aR--q6qP6f8oHXi-BgRNc8F4JBfnhVPg@mail.gmail.com>
 <CAL8PwkbJHSUm83T+dKXM2nb8FrTqZG5vR_1Rr3cqZjSaQKJW_Q@mail.gmail.com>
 <CAG_AXeZ1CqS8g9Yk=jTSwt1OMj9BeNayYmOkzwDs1oV_PY-hnw@mail.gmail.com>
From: Rohit Banga <iamrohitbanga@gmail.com>
Date: Fri, 21 Mar 2014 19:38:55 -0700
Message-ID: 
 <CAG_AXeamrVrcOLk7wmDYproWTe4RyNxYNzRFiBZn9+GAAaW3wQ@mail.gmail.com>
Subject: Re: Question about Payloads in Lucene 4.5
To: java-user@lucene.apache.org
Content-Type: multipart/alternative; boundary=f46d043bdf6a8c559804f528e86c

--f46d043bdf6a8c559804f528e86c
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

=E2=80=8BJust saw the implementation of MultiDocValues.getNumericValues(). =
It uses
sort of returns an anonymous inner classes to get the doc value from the
appropriate index reader. Very cool impleentation!
I guess that answers my question on how to get docVal from multiple=E2=80=
=8B
=E2=80=8B atomic readers.

It would be nice if you could help me with the other two questions though.

Thanks
Rohit Banga
http://iamrohitbanga.com/


On Fri, Mar 21, 2014 at 7:25 PM, Rohit Banga <iamrohitbanga@gmail.com>wrote=
:

> =E2=80=8BThanks Michael for your response.
>
> Few questions:
>
> 1. Can I expect better performance when retrieving a single
> NumericDocValue for all hits vs when I retrieve documents for all hits to
> fetch the field value? As far as I understand retrieving n documents from
> the index requires n disk reads. How many disk reads to I do when using
> NumericDocValues? How are they stored?
>
> 2. I tried looking for examples on how to use numeric doc values. I found
> that in new versions of lucene we have to use "AtomicReader".
> Found this: http://www.gossamer-threads.com/lists/lucene/java-user/182641
>
> So is this the code I am looking for:
> long getNumericDocValueForDocument(IndexSearcher searcher, int docId) {
>      IndexReader reader =3D searcher.getIndexReader();
>      long docVal =3D 0;
>      for (AtomicReaderContext rc : reader.leaves()) {
>         AtomicReader ar =3D rc.reader();
>         docVal =3D ar.getNumericDocValues().get(*docID*);
>      }
>      return docVal;
> }
>
> How do I know which docVal to return? It appears that each AtomicReader
> (every iteration of the loop) may return a docVal?
>
> 3. Can I only store NumericDocValues? Can I get something like
> StringDocValues? I have a string "id". I guess I could keep a mapping fro=
m
> numeric doc value (Long) to String but I want to avoid keeping two source=
s
> of information (Lucene Index and a HashMap). I can use SearcherManager to
> deal with concurrent searches and index updates (
> http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies=
.html),
> but how about managing two data sources Lucene index and HashMap<Long,
> String> with SearcherManager? Is there a way to achieve this using a cust=
om
> SearcherFactory?
>
>
> Thanks
> Rohit Banga
> http://iamrohitbanga.com/
>
>
> On Fri, Mar 21, 2014 at 3:26 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> DocValues are better than payloads.
>>
>> E.g. index a NumericDocValuesField with each doc, holding your id.
>>
>> Then at search time you can use MultiDocValues.getNumericValues.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Fri, Mar 21, 2014 at 4:35 PM, Rohit Banga <iamrohitbanga@gmail.com>
>> wrote:
>> > Hi everyone
>> >
>> > When I query a lucene index, I get back a list of document ids. This
>> index
>> > search is fast. Now for all documents matching the result I need a
>> unique
>> > String field called "id" which is stored in the document. From the
>> > documentation I gather that document ids are internal and I should not
>> use
>> > them for referencing my own data structures. Currently I iterate over
>> all
>> > the hits matching the document and then for each one I get the documen=
t
>> to
>> > read the field using IndexReader.document().
>> >
>> http://lucene.apache.org/core/4_5_0/core/org/apache/lucene/index/IndexRe=
ader.html
>> >
>> > I read the "id" field from the document and then use it further in my
>> > processing logic.
>> > The problem is that reading all documents to get all "id"'s is turning
>> out
>> > to be very slow. It is the bottleneck in my application. It would be
>> nice
>> > to have a way if lucene could return some metadata along with the
>> internal
>> > document id when I did a search. I do not want to read all documents
>> just
>> > to retrieve this metadata.
>> >
>> > The best solution I have come across searching on the net is to use
>> > payloads which will be returned by the fast index search query along
>> with
>> > the document ids.
>> >
>> > Is my understanding correct that using payloads I can get "id" string
>> field
>> > for all my documents faster than reading my entire document?
>> >
>> > I am not able to find a good example of how to store and retrieve
>> payloads?
>> > Can you please point me to a good resource to learn how to use payload=
s
>> and
>> > how they will impact performance?
>> > I am using Lucene 4.5.
>> >
>> > Thanks
>> > Rohit Banga
>> > http://iamrohitbanga.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

--f46d043bdf6a8c559804f528e86c--