lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward Drapkin (JIRA)" <>
Subject [jira] Commented: (LUCENE-2447) Add support for subsets of searchables inside a MultiSearcher/ParallelMultiSearcher instance's methods at runtime
Date Wed, 05 May 2010 19:17:02 GMT


Edward Drapkin commented on LUCENE-2447:

It's not entirely the fact that creating a MultiSearcher per request is too heavy.  if you'll
look at 2440, I also modified ParallelMultiSearcher to support a fixed thread pool;  what
I'm worried about is, even with a fixed thread pool of something small like 4 threads, the
concurrent request count could spiral the amount of threads that the JVM has to deal with
out of control.  If I can use the same ParallelMultiSearcher across requests, with a fixed
thread pool of something sane like 16 or 24 threads, then I can be reasonably sure that this
particular class isn't going to spiral thread counts out of control.  

As far as stuffing everything into the same index, we've looked into that and determined that
it isn't a real possibility because the size of the indexes - there's quite a few ranging
from a few MB to a few GB of data - would make the merge process relatively expensive and
coupled with the fact that the indexes themselves are built and maintained separately, we'd
be needing to run the merging process too frequently for it to be feasible.  

> Add support for subsets of searchables inside a MultiSearcher/ParallelMultiSearcher instance's
methods at runtime
> -----------------------------------------------------------------------------------------------------------------
>                 Key: LUCENE-2447
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 3.0.1
>         Environment: Irrelevant
>            Reporter: Edward Drapkin
>            Priority: Minor
>         Attachments: LUCENE-2447.patch
>   Original Estimate: 0h
>  Remaining Estimate: 0h
> Here's the situation: We have a site with a fair few amount of indexes that we're using
MultiSearcher/ParallelMultiSearcher for, but the users can select an arbitrary permutation
of indexes to search.  For example (contrived, but illustratory): the site has indexes numbered
1 - 10; user A wants to search in all 10; user B wants to search indexes 1, 2 and 3, user
C wants to search even-numbered indexes.  From Lucene 3.0.1, the only way to do this is to
continually instantiate a new MultiSearcher based on every permutation of indexes that a user
wants, which is not ideal at all.
> What I've done is add a new parameter to all methods in MultiSearcher that use the searchables
array (docFreq, search, rewrite and createDocFrequencyMap), a Set<Searchable> which
is checked for isEmpty() and contains() for every iteration over the searchables[].  The actual
logic has been moved into these methods and the old methods have become overloads that pass
a Collections.emptySet() into those methods, so I do not expect there to be a very noticeable
performance impact as a result of this modification, if it's measurable at all.
> I didn't modify the test for MultiSearcher very much, just enough to illustrate the that
subsetting of the search results works, since no other logic has changed.  If I need to do
more for the testing, let me know and I'll do it.
> I've attached the patches for, and

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message