lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Florence" <mflore...@airsmail.com>
Subject RE: Anyone use MultiSearcher class
Date Thu, 15 Jul 2004 13:46:00 GMT
Don, I think I finally understand your problem -- and mine -- with
MultiSearcher. I had tested an implementation of my system using
ParallelMultiSearcher to split a huge index over many computers.
I was very impressed by the results on my test data, but alarmed
after a trial with live data :)

Consider MultiSearcher.search(Query Q). Suppose that Q aggregated
over ALL the Searchables in the MultiSearcher would return 1000
documents. But, the Hits object created by search() will only cache
the first 100 documents. When Hits.doc(101) is called, Hits will
cache 200 documents -- then 400, 800, 1600 and so on. How does Hits
get these extra documents? By calling the MultiSearcher again.

Now consider a MultiSearcher as described above with 2 Searchables.
With respect to Q, Searchable S has 1000 documents, Searchable T
has zero. So to fetch the 101st document, not only is S searched,
but T is too, even though the result of Q applied to T is still zero
and will always be zero. The same thing will happen when fetching
the 201st, 401st and 801st document.

This accounts for my slow performance, and I think yours too. That
your observed degradation is a power of 2 is a clue.

My performance is especially vulnerable because "slave" Searchables
in the MultiSearcher are Remote -- accessed via RMI.

I guess I have to code smarter around MultiSearcher. One problem
you highlight is that Hits is final -- so it is not possible even to
modify the "100/200/400" cache size logic.

Any ideas from anyone would be much appreciated.

Mark Florence
CTO, AIRS
800-897-7714 x 1703
mflorence@airsmail.com




-----Original Message-----
From: Don Vaillancourt [mailto:donv@webimpact.com]
Sent: Monday, July 12, 2004 12:36 pm
To: Lucene Users List
Subject: Anyone use MultiSearcher class


Hello,

Has anyone used the Multisearcher class?

I have noticed that searching two indexes using this MultiSearcher class
takes 8 times longer than searching only one index.  I could understand if
it took 3 to 4 times longer to search due to sorting the two search results
and stuff, but why 8 times longer.

Is there some optimization that can be done to hasten the search?  Or
should I just write my own MultiSearcher.  The problem though is that there
is no way for me to create my own Hits object (no methods are available and
the class is final).

Anyone have any clue?

Thanks


Don Vaillancourt
Director of Software Development

WEB IMPACT INC.
416-815-2000 ext. 245
email: donv@web-impact.com
web: http://www.web-impact.com




This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright.  If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.













---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message