lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward Drapkin (JIRA)" <>
Subject [jira] Commented: (LUCENE-2447) Add support for subsets of searchables inside a MultiSearcher/ParallelMultiSearcher instance's methods at runtime
Date Wed, 05 May 2010 21:32:03 GMT


Edward Drapkin commented on LUCENE-2447:

Ah, cool, regarding LUCENE-2440 :)

You mention that it's possible to accomplish what this accomplishes with the current API,
via instantiating a MultiSearcher per request, which is possible, but I think this way would
be much simpler and while increasing the complexity of the API, it does so in a consistent
way that's easy to understand and use (and doesn't break BC); if the difference between the
proposed change of the API and the current API is too different, maybe splitting the API change
into a new class would be the solution (i.e. two classes: MultiSearcher and SplittableMultiSearcher).
 Either way, under the current API, calls look like this:

  public void doSearch() {
    Set<Searchable> searchables = this.getSearchablesFromRequestParams(); //faux method

    MultiSearcher mSearcher = new MultiSearcher(searchables);, 1000);

Compare with, under my proposed API:

  public void doSearch() {, someQuery, 1000);

Keeping in mind that I'm not sure this is an entirely esoteric/niche requirement (surely I
can't be the only one who has this issue) and this doesn't break any existing code or significantly
increase its execution time, the end result is much cleaner code (from userland) that's also
less resource intensive (however cheap - on my completely idle Q9300 it takes about 3us (20us
for ParallelMultiSearcher) to instantiate* - it may be to instantiate MultiSearcher, it's
still more expensive that keeping one instance around, especially in a heavily trafficked
environment), especially regarding memory usage and garbage collection times.

* I created 100 indexes, each with 10,000 documents (each of which had 100 fields named name1,
name2, etc. with 128 bytes of random string) and then tested that - each index was ~60MB.
 I can paste the code I used if you would like.

> Add support for subsets of searchables inside a MultiSearcher/ParallelMultiSearcher instance's
methods at runtime
> -----------------------------------------------------------------------------------------------------------------
>                 Key: LUCENE-2447
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 3.0.1
>         Environment: Irrelevant
>            Reporter: Edward Drapkin
>            Priority: Minor
>         Attachments: LUCENE-2447.patch
>   Original Estimate: 0h
>  Remaining Estimate: 0h
> Here's the situation: We have a site with a fair few amount of indexes that we're using
MultiSearcher/ParallelMultiSearcher for, but the users can select an arbitrary permutation
of indexes to search.  For example (contrived, but illustratory): the site has indexes numbered
1 - 10; user A wants to search in all 10; user B wants to search indexes 1, 2 and 3, user
C wants to search even-numbered indexes.  From Lucene 3.0.1, the only way to do this is to
continually instantiate a new MultiSearcher based on every permutation of indexes that a user
wants, which is not ideal at all.
> What I've done is add a new parameter to all methods in MultiSearcher that use the searchables
array (docFreq, search, rewrite and createDocFrequencyMap), a Set<Searchable> which
is checked for isEmpty() and contains() for every iteration over the searchables[].  The actual
logic has been moved into these methods and the old methods have become overloads that pass
a Collections.emptySet() into those methods, so I do not expect there to be a very noticeable
performance impact as a result of this modification, if it's measurable at all.
> I didn't modify the test for MultiSearcher very much, just enough to illustrate the that
subsetting of the search results works, since no other logic has changed.  If I need to do
more for the testing, let me know and I'll do it.
> I've attached the patches for, and

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message