lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <>
Subject Re: Hits not serializable. (bulk document retrieval)
Date Mon, 27 Jun 2005 00:52:33 GMT

Thanks for the suggestion. I have solved this problem locally, I'm 
wondering if this should be in Lucene core.

I have seven machines in a rack, each with Lucene indexes of about 30 
million messages each. I'm trying to search across them with 
RemoteSearcher and ParallelMultiSearcher.

Search times are impressive, only hundreds of milliseconds (for multiple 
term queries).

Unfortunately, in order for the search to be useful, I need to pull back 
a page worth of hits. In my case this is the first 25 results.

With the current out-of-the-box API this causes 50 sequential RMI calls, 
which seriously degrades the total time that the client must wait for a 

ParallelMultiSearcher itself is pretty reasonable, though I have my own 
re-implementation using the java.util.concurrent framework. However, the 
Lucene API is simply not optimised for retrieving Documents in bulk.

Obviously we can all work around it in different ways, but I feel that 
it should be core functionality.

Searchable could have a bulk retrieval method and ParallelMultiSearcher 
should be able to execute it *in parallel* to each underlying searcher.

I've implemented it locally. If anyone feels that this addresses a 
genuine problem, let me know.

In short, should Lucene provide an efficient document paging facility, 
or is it not considered core?


P.S. I'm using a CVS snapshot of Lucene 1.9.

Nrupal Akolkar wrote:
> Hi,
> Dear try doing the following,
> 1. write an extension class and extend the class containing search(...) 
> method you listed. Define that class to be serialized.
> 2. let the class be overriding search method with just same content in it as 
> in the super class.
> 3. build your lucene 1.** file again with ant, and try working out the way 
> you desire.
> I think this solves your problem.
> Nrupal
>  On 6/24/05, Robert Newson <> wrote: 
>>Can Hits be made serializable?
>>I'm finding that almost all of the time for a remote search is spent
>>lazily retrieving document objects.
>>I'd like to create a remote interface like with a method like;
>>Hits search(Query query, Filter filter, int prefetch)
>>The remote end would call Hits.doc() for the first $prefetch entries.
>>This will make a huge difference to remote searching performance;
>>total fetch server1 server2 server3
>>862 699 86 69 96
>>For now, I'll use Document[] as the return value, but Hits feels more
>>To unsubscribe, e-mail:
>>For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message