lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Questions about FuzzyQuery in Lucene 4.x
Date Tue, 29 Jan 2013 19:50:23 GMT
I'm sorry, but for anybody to help you here, you really need to be able to 
provide a concise test case, like 10-20 lines of code, completely 
self-contained. If you think you need a million documents to repro what you 
claimed was a simple scenario, then you leave me very, very confused - and 
unable to help you any further.

-- Jack Krupansky

-----Original Message----- 
From: George Kelvin
Sent: Tuesday, January 29, 2013 2:43 PM
To: java-user@lucene.apache.org
Subject: Re: Questions about FuzzyQuery in Lucene 4.x

Hi Jack,

The problematic query is "scar"+"wads".

There are several (more than 10) documents in the data with the content
"star wars", so I think that query should be able to find all these
documents.

I was trying to provide a minimal test case, but I couldn't reduce the size
of data showing the failure.

The size of the minimal data showing the failure I got so far is around 2
million.

However, I found a suspicious document with content "scor". If I remove it
from the 2 million documents data, that query can find all the "star wars"
documents. If I add it back, then the query can't find any.

I tried to reduce the size of the data to 1 million further and add that
"scor" document, but now the query can still find all the "star wars"
documents.

Is it possible that Lucene somehow fail to find all the valid terms within
the edit distance?

Thanks!

George


On Tue, Jan 29, 2013 at 10:02 AM, Jack Krupansky 
<jack@basetechnology.com>wrote:

> I also noticed that you have "MUST" for your full string of fuzzy terms -
> that means everyone of them must appear in an indexed document to be
> matched. Is it possible that maybe even one term was not in the same
> indexed document?
>
> Try to provide a complete example that shows the input data and the query
> - all the literals. In other words, construct a minimal test case that
> shows the failure.
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: George Kelvin
> Sent: Tuesday, January 29, 2013 12:28 PM
>
> To: java-user@lucene.apache.org
> Subject: Re: Questions about FuzzyQuery in Lucene 4.x
>
> Hi Jack,
>
> ed is set to 1 here and I have lowercased all the data and queries.
>
> Regarding the indexed data factor you mentioned, can you elaborate more?
>
> Thanks!
>
> George
>
>
> On Tue, Jan 29, 2013 at 9:10 AM, Jack Krupansky <jack@basetechnology.com>*
> *wrote:
>
>  That depends on the value of "ed", and the indexed data.
>>
>> Another factor to take into consideration is that a case change ("Star"
>> vs. "star") also counts as an edit.
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: George Kelvin
>> Sent: Tuesday, January 29, 2013 11:49 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: Questions about FuzzyQuery in Lucene 4.x
>>
>>
>> Hi Jack,
>>
>> Thanks for your reply!
>>
>> I don't think I passed the prefixLength parameter in.
>>
>> Here is the code I used to build the FuzzyQuery:
>>
>>            String[] words = str.split("\\+");
>>            BooleanQuery query = new BooleanQuery();
>>
>>            for (int i=0; i<words.length; i++)
>>            {
>>                Term t = new Term(field, words[i]);
>>                FuzzyQuery fq = new FuzzyQuery(t, ed);
>>                query.add(fq, BooleanClause.Occur.MUST);
>>            }
>>
>>            int k = 10;
>>            TopDocs results = searcher.search(query, k);
>>
>> Does it look right to you?
>>
>> Thanks!
>>
>> George
>>
>> ------------------------------****----------------------------**
>> --**---------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.****apache.org<
>> java-user-**unsubscribe@lucene.apache.org<java-user-unsubscribe@lucene.apache.org>
>> >
>> For additional commands, e-mail: java-user-help@lucene.apache.****org<
>> java-user-help@lucene.**apache.org <java-user-help@lucene.apache.org>>
>>
>>
>>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: 
> java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
> For additional commands, e-mail: 
> java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message