Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Message-Id: <s0a0b85b.001@n6mcgw16.cchmc.org>
Date: Tue, 11 May 2004 11:26:06 -0400
From: "Gerard Sychay" <Gerard.Sychay@cchmc.org>
To: <lucene-user@jakarta.apache.org>
Subject: Re: Mixing database and lucene searches
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

>>> "Eric Jain" <Eric.Jain@isb-sib.ch> 05/11/04 04:47AM >>>
> > Hits hits = searcher.search(new TermQuery("text", "foo")
> > Set hitPKs = new Set();
> > for each doc in hits:
> >    hitPKs.put(doc.getField("pk"))
> 
> Retrieving even one custom field for every document of a possibly
large
> data set
> can end up being very slow, it seems. This complicates things a
lot...

Glen, I don't know your application specifics, but if you are paging
results, there is no need to retrieve all the primary keys at once.  I
had a similar problem.  I ended up doing the following:

- Store ONLY the primary keys in the index.  Ideally, you only  need
two fields per Lucene Document: the tokenized text to be searched, and
stored corresponding primary key.
- Upon searching, get all the Hits as normal, say you get 10000 hits.
- But the first page only displays first 10 hits, so retrieve first 10
primary keys from the Hits, use these to form a SQL query and retrieve
any info you need from the DB.  This way, you only handling 10 documents
at a time, or howevery many per page.
In actual use, this is very fast.

HTH

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org