Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@lucene.apache.org
Message-ID: <11059004.27971273087022933.JavaMail.jira@thor>
Date: Wed, 5 May 2010 15:17:02 -0400 (EDT)
From: "Edward Drapkin (JIRA)" <jira@apache.org>
To: dev@lucene.apache.org
Subject: [jira] Commented: (LUCENE-2447) Add support for subsets of
 searchables inside a MultiSearcher/ParallelMultiSearcher instance's methods
 at runtime
In-Reply-To: <18448830.25871273081203735.JavaMail.jira@thor>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/LUCENE-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864454#action_12864454 ] 

Edward Drapkin commented on LUCENE-2447:
----------------------------------------

It's not entirely the fact that creating a MultiSearcher per request is too heavy.  if you'll look at 2440, I also modified ParallelMultiSearcher to support a fixed thread pool;  what I'm worried about is, even with a fixed thread pool of something small like 4 threads, the concurrent request count could spiral the amount of threads that the JVM has to deal with out of control.  If I can use the same ParallelMultiSearcher across requests, with a fixed thread pool of something sane like 16 or 24 threads, then I can be reasonably sure that this particular class isn't going to spiral thread counts out of control.  

As far as stuffing everything into the same index, we've looked into that and determined that it isn't a real possibility because the size of the indexes - there's quite a few ranging from a few MB to a few GB of data - would make the merge process relatively expensive and coupled with the fact that the indexes themselves are built and maintained separately, we'd be needing to run the merging process too frequently for it to be feasible.  

> Add support for subsets of searchables inside a MultiSearcher/ParallelMultiSearcher instance's methods at runtime
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2447
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2447
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 3.0.1
>         Environment: Irrelevant
>            Reporter: Edward Drapkin
>            Priority: Minor
>         Attachments: LUCENE-2447.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Here's the situation: We have a site with a fair few amount of indexes that we're using MultiSearcher/ParallelMultiSearcher for, but the users can select an arbitrary permutation of indexes to search.  For example (contrived, but illustratory): the site has indexes numbered 1 - 10; user A wants to search in all 10; user B wants to search indexes 1, 2 and 3, user C wants to search even-numbered indexes.  From Lucene 3.0.1, the only way to do this is to continually instantiate a new MultiSearcher based on every permutation of indexes that a user wants, which is not ideal at all.
> What I've done is add a new parameter to all methods in MultiSearcher that use the searchables array (docFreq, search, rewrite and createDocFrequencyMap), a Set<Searchable> which is checked for isEmpty() and contains() for every iteration over the searchables[].  The actual logic has been moved into these methods and the old methods have become overloads that pass a Collections.emptySet() into those methods, so I do not expect there to be a very noticeable performance impact as a result of this modification, if it's measurable at all.
> I didn't modify the test for MultiSearcher very much, just enough to illustrate the that subsetting of the search results works, since no other logic has changed.  If I need to do more for the testing, let me know and I'll do it.
> I've attached the patches for MultiSearcher.java, ParallelMultiSearcher.java and TestMultiSearcher.java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org