lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ype Kingma <>
Subject Re: Mixing database and lucene searches
Date Tue, 11 May 2004 18:01:12 GMT
On Tuesday 11 May 2004 17:26, Gerard Sychay wrote:
> >>> "Eric Jain" <> 05/11/04 04:47AM >>>
> >>>
> > > Hits hits = TermQuery("text", "foo")
> > > Set hitPKs = new Set();
> > > for each doc in hits:
> > >    hitPKs.put(doc.getField("pk"))
> > Retrieving even one custom field for every document of a possibly
> large
> > data set
> > can end up being very slow, it seems. This complicates things a
> lot...
> Glen, I don't know your application specifics, but if you are paging
> results, there is no need to retrieve all the primary keys at once.  I
> had a similar problem.  I ended up doing the following:
> - Store ONLY the primary keys in the index.  Ideally, you only  need
> two fields per Lucene Document: the tokenized text to be searched, and
> stored corresponding primary key.
> - Upon searching, get all the Hits as normal, say you get 10000 hits.
> - But the first page only displays first 10 hits, so retrieve first 10
> primary keys from the Hits, use these to form a SQL query and retrieve
> any info you need from the DB.  This way, you only handling 10 documents
> at a time, or howevery many per page.
> In actual use, this is very fast.

A Hits object caches some documents for you, but when you need more
control over stored field retrieval you can implement your own
to retrieve them in the way a database does it:
Get all the doc nrs needed, sort them and retrieve the (non cached) stored
fields in that order. Normally, that (almost) minimizes the distance the disk head
needs to travel for the retrieval.

Lucene stores all the document fields for single document close
together, so retrieving all stored fields isn't much more expensive than
retrieving only the primary key.

Kind regards,

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message