lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Busch (JIRA)" <>
Subject [jira] Commented: (LUCENE-538) Using WildcardQuery with MultiSearcher, and Boolean MUST_NOT clause
Date Tue, 21 Nov 2006 19:17:03 GMT
    [ ] 
Michael Busch commented on LUCENE-538:

The reason for this problem is how the MultiSearcher rewrites queries. It calls rewrite()
on all Searchables and combines the rewritten queries thereafter. 

And here is the bug: 
Lets say we have the query +a -b* and two Searchables. The dictionary of the first Searchable's
index has two expansions for b*, so calling rewrite on the first Searchable results in the
query +a -(b1 b2). However the dictionary of the second Searchable's index does not have any
expansions, so the second rewritten query is +a -(). To combine these two queries the MultiSearcher
now creates a new BooleanQuery and adds both rewritten queries as SHOULD clauses, so the combined
query looks like: (+a -(b1 b2)) (+a -()). This query is used to search in both indexes. So
now all documents that contain 'a' are found, because the negative clause within the second
SHOULD clause is empty. Thats why too many results from the first index are returned, the
-b* has no effect at all anymore.

The workaround Paul suggested works, because it calls rewrite on MultiReader instead MultiSearcher.
Then the b* is expanded using the merged dictionaries from both indexes. So this workaround
simply hides the problem in MultiSearcher.

> Using WildcardQuery with MultiSearcher, and Boolean MUST_NOT clause
> -------------------------------------------------------------------
>                 Key: LUCENE-538
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 1.9
>         Environment: Ubuntu Linux, java version 1.5.0_04
>            Reporter: Helen Warren
>         Attachments:
> We are searching across multiple indices using a MultiSearcher. There seems to be a problem
when we use a WildcardQuery to exclude documents from the result set. I attach a set of unit
tests illustrating the problem.
> In these tests, we have two indices. Each index contains a set of documents with fields
for 'title',  'section' and 'index'. The final aim is to do a keyword search, across both
indices, on the title field and be able to exclude documents from certain sections (and their
subsections) using a
> WildcardQuery on the section field.
>  e.g. return documents from both indices which have the string 'xyzpqr' in their title
but which do not lie
>  in the news section or its subsections (section = /news/*).
> The first unit test (testExcludeSectionsWildCard) fails trying to do this.
>  If we relax any of the constraints made above, tests pass:
> * Don't use WildcardQuery, but pass in the news section and it's child section to exclude
explicitly (testExcludeSectionsExplicit)</li>
> * Exclude results from just one section, not it's children too i.e. don't use WildcardQuery(testExcludeSingleSection)</li>
> * Do use WildcardQuery, and exclude a section and its children, but just use one index
thereby using the simple
>    IndexReader and IndexSearcher objects (testExcludeSectionsOneIndex).
> * Try the boolean MUST clause rather than MUST_NOT using the WildcardQuery i.e. only
include results from the /news/ section
>    and its children.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message