Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 19944 invoked from network); 5 May 2010 19:17:25 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 5 May 2010 19:17:25 -0000 Received: (qmail 25387 invoked by uid 500); 5 May 2010 19:17:24 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 25330 invoked by uid 500); 5 May 2010 19:17:24 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 25323 invoked by uid 99); 5 May 2010 19:17:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 May 2010 19:17:24 +0000 X-ASF-Spam-Status: No, hits=-1393.1 required=10.0 tests=ALL_TRUSTED,AWL X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 May 2010 19:17:23 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o45JH2la001163 for ; Wed, 5 May 2010 19:17:03 GMT Message-ID: <11059004.27971273087022933.JavaMail.jira@thor> Date: Wed, 5 May 2010 15:17:02 -0400 (EDT) From: "Edward Drapkin (JIRA)" To: dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-2447) Add support for subsets of searchables inside a MultiSearcher/ParallelMultiSearcher instance's methods at runtime In-Reply-To: <18448830.25871273081203735.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864454#action_12864454 ] Edward Drapkin commented on LUCENE-2447: ---------------------------------------- It's not entirely the fact that creating a MultiSearcher per request is too heavy. if you'll look at 2440, I also modified ParallelMultiSearcher to support a fixed thread pool; what I'm worried about is, even with a fixed thread pool of something small like 4 threads, the concurrent request count could spiral the amount of threads that the JVM has to deal with out of control. If I can use the same ParallelMultiSearcher across requests, with a fixed thread pool of something sane like 16 or 24 threads, then I can be reasonably sure that this particular class isn't going to spiral thread counts out of control. As far as stuffing everything into the same index, we've looked into that and determined that it isn't a real possibility because the size of the indexes - there's quite a few ranging from a few MB to a few GB of data - would make the merge process relatively expensive and coupled with the fact that the indexes themselves are built and maintained separately, we'd be needing to run the merging process too frequently for it to be feasible. > Add support for subsets of searchables inside a MultiSearcher/ParallelMultiSearcher instance's methods at runtime > ----------------------------------------------------------------------------------------------------------------- > > Key: LUCENE-2447 > URL: https://issues.apache.org/jira/browse/LUCENE-2447 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 3.0.1 > Environment: Irrelevant > Reporter: Edward Drapkin > Priority: Minor > Attachments: LUCENE-2447.patch > > Original Estimate: 0h > Remaining Estimate: 0h > > Here's the situation: We have a site with a fair few amount of indexes that we're using MultiSearcher/ParallelMultiSearcher for, but the users can select an arbitrary permutation of indexes to search. For example (contrived, but illustratory): the site has indexes numbered 1 - 10; user A wants to search in all 10; user B wants to search indexes 1, 2 and 3, user C wants to search even-numbered indexes. From Lucene 3.0.1, the only way to do this is to continually instantiate a new MultiSearcher based on every permutation of indexes that a user wants, which is not ideal at all. > What I've done is add a new parameter to all methods in MultiSearcher that use the searchables array (docFreq, search, rewrite and createDocFrequencyMap), a Set which is checked for isEmpty() and contains() for every iteration over the searchables[]. The actual logic has been moved into these methods and the old methods have become overloads that pass a Collections.emptySet() into those methods, so I do not expect there to be a very noticeable performance impact as a result of this modification, if it's measurable at all. > I didn't modify the test for MultiSearcher very much, just enough to illustrate the that subsetting of the search results works, since no other logic has changed. If I need to do more for the testing, let me know and I'll do it. > I've attached the patches for MultiSearcher.java, ParallelMultiSearcher.java and TestMultiSearcher.java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org