Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 63642 invoked from network); 26 Jul 2004 16:12:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 26 Jul 2004 16:12:59 -0000 Received: (qmail 17658 invoked by uid 500); 26 Jul 2004 16:12:51 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 17491 invoked by uid 500); 26 Jul 2004 16:12:49 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 17477 invoked by uid 99); 26 Jul 2004 16:12:49 -0000 X-ASF-Spam-Status: No, hits=1.1 required=10.0 tests=FORGED_RCVD_HELO,HTML_20_30,HTML_FONT_BIG,HTML_MESSAGE X-Spam-Check-By: apache.org Received: from [66.207.192.6] (HELO mta0.beanfield.net) (66.207.192.6) by apache.org (qpsmtpd/0.27.1) with ESMTP; Mon, 26 Jul 2004 09:12:46 -0700 Received: from don.webimpact.com (66-207-218-34.beanfield.net [66.207.218.34] (may be forged)) by mta0.beanfield.net (8.12.11/8.12.6) with ESMTP id i6QGCPi6078989 for ; Mon, 26 Jul 2004 12:12:26 -0400 (EDT) (envelope-from donv@webimpact.com) Message-Id: <6.1.1.1.0.20040726121256.01b07ec0@localhost> X-Sender: donv@web-impact.com@localhost X-Mailer: QUALCOMM Windows Eudora Version 6.1.1.1 Date: Mon, 26 Jul 2004 12:14:07 -0400 To: "Lucene Users List" From: Don Vaillancourt Subject: RE: Anyone use MultiSearcher class In-Reply-To: References: <6.1.1.1.0.20040726105814.01b99ec0@localhost> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="=====================_1595046==.ALT" X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N --=====================_1595046==.ALT Content-Type: text/plain; charset="us-ascii"; format=flowed Eh Mark, Are you involved with Lucene development? At 11:39 AM 26/07/2004, you wrote: >Don, at the low level, the issue isn't necessarily caching results >from page-to-page (as viewed by some UI.) Such a cache would need to >be co-ordinated with index writes. > >Rather, I plan to focus on the way Hits first reads 100 hits, then 200, >then 400 and so on -- but all Hits knows about is the MultiSearcher. >This means that in order to find the 101st hit, Hits effectively asks >ALL the searchers in the MultiSearcher to search again -- even though it >could be known that SOME of those searchers are incapable of returning >results. > >-- Mark Florence > >-----Original Message----- >From: Don Vaillancourt [mailto:donv@webimpact.com] >Sent: Monday, July 26, 2004 11:06 am >To: Lucene Users List; Lucene Users List >Subject: RE: Anyone use MultiSearcher class > > >Thanks for the info. > >Maybe the best solution to this may be to perform multiple individual >searches, create a container class and store all the hits sorted by >relevance within that class and then cache/serialize this result for the >current search for page by page manipulation. > > >At 09:46 AM 15/07/2004, Mark Florence wrote: > >Don, I think I finally understand your problem -- and mine -- with > >MultiSearcher. I had tested an implementation of my system using > >ParallelMultiSearcher to split a huge index over many computers. > >I was very impressed by the results on my test data, but alarmed > >after a trial with live data :) > > > >Consider MultiSearcher.search(Query Q). Suppose that Q aggregated > >over ALL the Searchables in the MultiSearcher would return 1000 > >documents. But, the Hits object created by search() will only cache > >the first 100 documents. When Hits.doc(101) is called, Hits will > >cache 200 documents -- then 400, 800, 1600 and so on. How does Hits > >get these extra documents? By calling the MultiSearcher again. > > > >Now consider a MultiSearcher as described above with 2 Searchables. > >With respect to Q, Searchable S has 1000 documents, Searchable T > >has zero. So to fetch the 101st document, not only is S searched, > >but T is too, even though the result of Q applied to T is still zero > >and will always be zero. The same thing will happen when fetching > >the 201st, 401st and 801st document. > > > >This accounts for my slow performance, and I think yours too. That > >your observed degradation is a power of 2 is a clue. > > > >My performance is especially vulnerable because "slave" Searchables > >in the MultiSearcher are Remote -- accessed via RMI. > > > >I guess I have to code smarter around MultiSearcher. One problem > >you highlight is that Hits is final -- so it is not possible even to > >modify the "100/200/400" cache size logic. > > > >Any ideas from anyone would be much appreciated. > > > >Mark Florence > >CTO, AIRS > >800-897-7714 x 1703 > >mflorence@airsmail.com > > > > > > > > > >-----Original Message----- > >From: Don Vaillancourt [mailto:donv@webimpact.com] > >Sent: Monday, July 12, 2004 12:36 pm > >To: Lucene Users List > >Subject: Anyone use MultiSearcher class > > > > > >Hello, > > > >Has anyone used the Multisearcher class? > > > >I have noticed that searching two indexes using this MultiSearcher class > >takes 8 times longer than searching only one index. I could understand if > >it took 3 to 4 times longer to search due to sorting the two search results > >and stuff, but why 8 times longer. > > > >Is there some optimization that can be done to hasten the search? Or > >should I just write my own MultiSearcher. The problem though is that there > >is no way for me to create my own Hits object (no methods are available and > >the class is final). > > > >Anyone have any clue? > > > >Thanks > > > > > >Don Vaillancourt > >Director of Software Development > > > >WEB IMPACT INC. > >416-815-2000 ext. 245 > >email: donv@web-impact.com > >web: http://www.web-impact.com > > > > > > > > > >This email message is intended only for the addressee(s) > >and contains information that may be confidential and/or > >copyright. If you are not the intended recipient please > >notify the sender by reply email and immediately delete > >this email. Use, disclosure or reproduction of this email > >by anyone other than the intended recipient(s) is strictly > >prohibited. No representation is made that this email or > >any attachments are free of viruses. Virus scanning is > >recommended and is the responsibility of the recipient. > > > > > > > > > > > > > > > > > > > > > > > > > > > >--------------------------------------------------------------------- > >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > >For additional commands, e-mail: lucene-user-help@jakarta.apache.org > >Don Vaillancourt >Director of Software Development > >WEB IMPACT INC. >416-815-2000 ext. 245 >email: donv@web-impact.com >web: http://www.web-impact.com > > > > >This email message is intended only for the addressee(s) >and contains information that may be confidential and/or >copyright. If you are not the intended recipient please >notify the sender by reply email and immediately delete >this email. Use, disclosure or reproduction of this email >by anyone other than the intended recipient(s) is strictly >prohibited. No representation is made that this email or >any attachments are free of viruses. Virus scanning is >recommended and is the responsibility of the recipient. > > > > > > > > > > > > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >For additional commands, e-mail: lucene-user-help@jakarta.apache.org Don Vaillancourt Director of Software Development WEB IMPACT INC. 416-815-2000 ext. 245 email: donv@web-impact.com web: http://www.web-impact.com This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. --=====================_1595046==.ALT--