Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A17FC108EA for ; Sat, 22 Mar 2014 02:39:50 +0000 (UTC) Received: (qmail 28342 invoked by uid 500); 22 Mar 2014 02:39:43 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 28259 invoked by uid 500); 22 Mar 2014 02:39:42 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 28246 invoked by uid 99); 22 Mar 2014 02:39:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Mar 2014 02:39:42 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of iamrohitbanga@gmail.com designates 74.125.82.43 as permitted sender) Received: from [74.125.82.43] (HELO mail-wg0-f43.google.com) (74.125.82.43) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Mar 2014 02:39:36 +0000 Received: by mail-wg0-f43.google.com with SMTP id x13so2099776wgg.14 for ; Fri, 21 Mar 2014 19:39:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=8C/uKnlmgOUNSH3PB8vW4o9e8TBhxDugKRkwIhKe1FY=; b=LPwIufFkBZTGpEoWzdddzi8fFDP0AkN3tb1LL8xs/gTdRrNe5e9/URt+kBt9qhF8HC 49614oOx0P0XYAZyOlwnLd9SUKGQ/hbqCopW/GWYKp2skQWQmSpiLpnAPTvowFMUfAAi 6nf2L1oiGNCm031CU20NLwNLP1305UNl7F7aIIJimPGCuljlvuVsdpR6KhsLBPHzfgnD b/F1rbb6z0RJUiQcDr+wc2PMAOnsQty7JYO2tXiYYU7+Sm0zcV2bRTufQHpuUBLBmO9O 3pLaznnLqEI1ls74It61VqNA6IMEy67hvTeAwWsxWNTdPUG+p37RLIp2yH3h16GG95p2 B5wQ== X-Received: by 10.180.77.74 with SMTP id q10mr862482wiw.39.1395455955848; Fri, 21 Mar 2014 19:39:15 -0700 (PDT) MIME-Version: 1.0 Received: by 10.227.11.135 with HTTP; Fri, 21 Mar 2014 19:38:55 -0700 (PDT) In-Reply-To: References: From: Rohit Banga Date: Fri, 21 Mar 2014 19:38:55 -0700 Message-ID: Subject: Re: Question about Payloads in Lucene 4.5 To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=f46d043bdf6a8c559804f528e86c X-Virus-Checked: Checked by ClamAV on apache.org --f46d043bdf6a8c559804f528e86c Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable =E2=80=8BJust saw the implementation of MultiDocValues.getNumericValues(). = It uses sort of returns an anonymous inner classes to get the doc value from the appropriate index reader. Very cool impleentation! I guess that answers my question on how to get docVal from multiple=E2=80= =8B =E2=80=8B atomic readers. It would be nice if you could help me with the other two questions though. Thanks Rohit Banga http://iamrohitbanga.com/ On Fri, Mar 21, 2014 at 7:25 PM, Rohit Banga wrote= : > =E2=80=8BThanks Michael for your response. > > Few questions: > > 1. Can I expect better performance when retrieving a single > NumericDocValue for all hits vs when I retrieve documents for all hits to > fetch the field value? As far as I understand retrieving n documents from > the index requires n disk reads. How many disk reads to I do when using > NumericDocValues? How are they stored? > > 2. I tried looking for examples on how to use numeric doc values. I found > that in new versions of lucene we have to use "AtomicReader". > Found this: http://www.gossamer-threads.com/lists/lucene/java-user/182641 > > So is this the code I am looking for: > long getNumericDocValueForDocument(IndexSearcher searcher, int docId) { > IndexReader reader =3D searcher.getIndexReader(); > long docVal =3D 0; > for (AtomicReaderContext rc : reader.leaves()) { > AtomicReader ar =3D rc.reader(); > docVal =3D ar.getNumericDocValues().get(*docID*); > } > return docVal; > } > > How do I know which docVal to return? It appears that each AtomicReader > (every iteration of the loop) may return a docVal? > > 3. Can I only store NumericDocValues? Can I get something like > StringDocValues? I have a string "id". I guess I could keep a mapping fro= m > numeric doc value (Long) to String but I want to avoid keeping two source= s > of information (Lucene Index and a HashMap). I can use SearcherManager to > deal with concurrent searches and index updates ( > http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies= .html), > but how about managing two data sources Lucene index and HashMap String> with SearcherManager? Is there a way to achieve this using a cust= om > SearcherFactory? > > > Thanks > Rohit Banga > http://iamrohitbanga.com/ > > > On Fri, Mar 21, 2014 at 3:26 PM, Michael McCandless < > lucene@mikemccandless.com> wrote: > >> DocValues are better than payloads. >> >> E.g. index a NumericDocValuesField with each doc, holding your id. >> >> Then at search time you can use MultiDocValues.getNumericValues. >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> On Fri, Mar 21, 2014 at 4:35 PM, Rohit Banga >> wrote: >> > Hi everyone >> > >> > When I query a lucene index, I get back a list of document ids. This >> index >> > search is fast. Now for all documents matching the result I need a >> unique >> > String field called "id" which is stored in the document. From the >> > documentation I gather that document ids are internal and I should not >> use >> > them for referencing my own data structures. Currently I iterate over >> all >> > the hits matching the document and then for each one I get the documen= t >> to >> > read the field using IndexReader.document(). >> > >> http://lucene.apache.org/core/4_5_0/core/org/apache/lucene/index/IndexRe= ader.html >> > >> > I read the "id" field from the document and then use it further in my >> > processing logic. >> > The problem is that reading all documents to get all "id"'s is turning >> out >> > to be very slow. It is the bottleneck in my application. It would be >> nice >> > to have a way if lucene could return some metadata along with the >> internal >> > document id when I did a search. I do not want to read all documents >> just >> > to retrieve this metadata. >> > >> > The best solution I have come across searching on the net is to use >> > payloads which will be returned by the fast index search query along >> with >> > the document ids. >> > >> > Is my understanding correct that using payloads I can get "id" string >> field >> > for all my documents faster than reading my entire document? >> > >> > I am not able to find a good example of how to store and retrieve >> payloads? >> > Can you please point me to a good resource to learn how to use payload= s >> and >> > how they will impact performance? >> > I am using Lucene 4.5. >> > >> > Thanks >> > Rohit Banga >> > http://iamrohitbanga.com/ >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > --f46d043bdf6a8c559804f528e86c--