Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 43826 invoked from network); 11 May 2004 18:01:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 11 May 2004 18:01:42 -0000 Received: (qmail 43653 invoked by uid 500); 11 May 2004 18:02:48 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 43392 invoked by uid 500); 11 May 2004 18:02:45 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 43330 invoked by uid 98); 11 May 2004 18:02:44 -0000 Received: from ykingma@xs4all.nl by hermes.apache.org by uid 82 with qmail-scanner-1.20 (clamuko: 0.70. Clear:RC:0(194.109.24.11):. Processed in 0.07753 secs); 11 May 2004 18:02:44 -0000 X-Qmail-Scanner-Mail-From: ykingma@xs4all.nl via hermes.apache.org X-Qmail-Scanner: 1.20 (Clear:RC:0(194.109.24.11):. Processed in 0.07753 secs) Received: from unknown (HELO smtp-out1.xs4all.nl) (194.109.24.11) by hermes.apache.org with SMTP; 11 May 2004 18:02:44 -0000 Received: from k7l.local (porta.xs4all.nl [80.127.24.69]) by smtp-out1.xs4all.nl (8.12.10/8.12.10) with ESMTP id i4BI1DBd075696 for ; Tue, 11 May 2004 20:01:13 +0200 (CEST) From: Ype Kingma To: lucene-user@jakarta.apache.org Subject: Re: Mixing database and lucene searches Date: Tue, 11 May 2004 20:01:12 +0200 User-Agent: KMail/1.5.4 References: In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200405112001.13090.ykingma@xs4all.nl> X-Spam-Rating: hermes.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N On Tuesday 11 May 2004 17:26, Gerard Sychay wrote: > >>> "Eric Jain" 05/11/04 04:47AM >>> > >>> > > > Hits hits = searcher.search(new TermQuery("text", "foo") > > > Set hitPKs = new Set(); > > > for each doc in hits: > > > hitPKs.put(doc.getField("pk")) > > > Retrieving even one custom field for every document of a possibly > large > > data set > > can end up being very slow, it seems. This complicates things a > > lot... > > Glen, I don't know your application specifics, but if you are paging > results, there is no need to retrieve all the primary keys at once. I > had a similar problem. I ended up doing the following: > > - Store ONLY the primary keys in the index. Ideally, you only need > two fields per Lucene Document: the tokenized text to be searched, and > stored corresponding primary key. > - Upon searching, get all the Hits as normal, say you get 10000 hits. > - But the first page only displays first 10 hits, so retrieve first 10 > primary keys from the Hits, use these to form a SQL query and retrieve > any info you need from the DB. This way, you only handling 10 documents > at a time, or howevery many per page. > In actual use, this is very fast. A Hits object caches some documents for you, but when you need more control over stored field retrieval you can implement your own to retrieve them in the way a database does it: Get all the doc nrs needed, sort them and retrieve the (non cached) stored fields in that order. Normally, that (almost) minimizes the distance the disk head needs to travel for the retrieval. Lucene stores all the document fields for single document close together, so retrieving all stored fields isn't much more expensive than retrieving only the primary key. Kind regards, Ype --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org