lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiaozheng Ma" <Xiaozheng...@redwood.com>
Subject RE: [PATCH]multiple wildcards ? at the end of search pattern return incorrect hits
Date Wed, 10 Nov 2004 21:47:55 GMT
Erik,

Thanks for your reply.

I think the reason it failed was because the assertion for query 2 and
query 3:  which is
        Query query2 = new WildcardQuery(new Term("body", "metal?"));
        Query query3 = new WildcardQuery(new Term("body", "metals?"));
According to the patch, the assertion should read:
	assertMatches(searcher, query2, 1); 
	// note that the number changes to
      // 2 since 'metal' is not a match
      // a match any more.
The same modification to query3's assertion is :
		assertMatches(searcher, query3, 0); 
		//change to 1 since there is no match

I add the query 6 and it pass the test after I made the two
modifications to the test case.

Thanks!
---
Xiaozheng

-------------------------------

I tried your patch locally by adding a test case to testQuestionmark of

TestWildcardQuery:

         Query query6 = new WildcardQuery(new Term("body", "metal??"));
         assertMatches(searcher, query6, 0);

I was not able to get it to work properly, as this test case failed  
after adding your patch.  Could you enhance this test case to include  
the bug you're fixing so that we can show that your implementation  
works properly?  I'd commit it if I can get this test case to pass :)

	Erik


On Nov 10, 2004, at 11:06 AM, Xiaozheng Ma wrote:

>
> Hi all,
>
> I sent a patch regarding wildcard search a couple of days ago(that was
> my 1st time sending anything to the list). I've seen no response so  
> far.
> Not sure if it has been received by any of you. On the other hand,  
> based
> on what I see these two days, you guys usually response to issues
> promptly.
>
> The problem is if you search on "ca??", the hit includes 'cat', 'CA',
> etc, while the user only wants 4 letter words start with CA, such as
> 'card', 'cash', to be returned. This happens only when multiple '?' at
> the end of search pattern. The solution is to check if the word that
is
> matching against search pattern ends while there is still '?' left. If
> this is the case, match should return false.
>
> The patch file is attached and here is the text copy:
>
----------------------------------------------------------------------- 
> -
> -
> --- WildcardTermEnum.org	2004-05-11 11:42:10.000000000 -0400
> +++ WildcardTermEnum.java	2004-11-08 14:35:14.823610500 -0500
> @@ -132,6 +132,10 @@
>              }
>              else
>              {
> +	      //to prevent "cat" matches "ca??"
> +	      if(wildchar == WILDCARD_CHAR){
> +		return false;
> +	      }	
>                // Look at the next character
>                wildcardSearchPos++;
>              }
>
>
----------------------------------------------------------------------- 
> -
> --
> Thanks!
>
> Xiaozheng
>
<WildcardPatch.txt>---------------------------------------------------- 
> -----------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message