Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 32243 invoked from network); 11 May 2004 15:26:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 11 May 2004 15:26:32 -0000 Received: (qmail 39467 invoked by uid 500); 11 May 2004 15:27:43 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 39410 invoked by uid 500); 11 May 2004 15:27:42 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 39360 invoked by uid 98); 11 May 2004 15:27:42 -0000 Received: from Gerard.Sychay@cchmc.org by hermes.apache.org by uid 82 with qmail-scanner-1.20 (clamuko: 0.70. Clear:RC:0(205.142.197.62):. Processed in 1.154268 secs); 11 May 2004 15:27:42 -0000 X-Qmail-Scanner-Mail-From: Gerard.Sychay@cchmc.org via hermes.apache.org X-Qmail-Scanner: 1.20 (Clear:RC:0(205.142.197.62):. Processed in 1.154268 secs) Received: from unknown (HELO n6mcgw16.cchmc.org) (205.142.197.62) by hermes.apache.org with SMTP; 11 May 2004 15:27:41 -0000 Received: from DOMSVC03-MTA by n6mcgw16.cchmc.org with Novell_GroupWise; Tue, 11 May 2004 11:26:19 -0400 Message-Id: X-Mailer: Novell GroupWise Internet Agent 6.5.1 Date: Tue, 11 May 2004 11:26:06 -0400 From: "Gerard Sychay" To: Subject: Re: Mixing database and lucene searches Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Spam-Rating: hermes.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N >>> "Eric Jain" 05/11/04 04:47AM >>> > > Hits hits = searcher.search(new TermQuery("text", "foo") > > Set hitPKs = new Set(); > > for each doc in hits: > > hitPKs.put(doc.getField("pk")) > > Retrieving even one custom field for every document of a possibly large > data set > can end up being very slow, it seems. This complicates things a lot... Glen, I don't know your application specifics, but if you are paging results, there is no need to retrieve all the primary keys at once. I had a similar problem. I ended up doing the following: - Store ONLY the primary keys in the index. Ideally, you only need two fields per Lucene Document: the tokenized text to be searched, and stored corresponding primary key. - Upon searching, get all the Hits as normal, say you get 10000 hits. - But the first page only displays first 10 hits, so retrieve first 10 primary keys from the Hits, use these to form a SQL query and retrieve any info you need from the DB. This way, you only handling 10 documents at a time, or howevery many per page. In actual use, this is very fast. HTH --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org