lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mathieu Lecarme <math...@garambrogne.net>
Subject Re: Error tolerant text search with Lucene?
Date Fri, 04 Apr 2008 13:59:38 GMT
Marjan Celikik a écrit :
> Mathieu Lecarme wrote:
>>
>>> wever I don't fully understand what do you mean by "iterate over 
>>> your query". I would like a conceptual answer how is this done with 
>>> Lucene, not a technical one..
>> Your query is a tree, with BooleanQuery as branch and other query as 
>> leaf. If you wont to transforma query in "tolerant query", you have 
>> to change Term query (the leaf), with a "OR"  branch with variant 
>> term as leaf.
>>
>> To find variant of a term, you have to used a list of your Term and 
>> apply a filter to its to group them. Common filter for that are 
>> stemming, ngram+levenstein distance, phonetic ...
>>
>> M.
>>
> OK, now it's more clear.. my final question is when is this filter 
> information incorporated.. at index time or at search time?
both. You've got two index, one for your data, one for your Term. The 
second (dictionnary, lexicon ...) uses one Document per Term, and n 
Field for informations like ngram or phonetic. When you search a near 
word, you build data from the word, build a request with this data, and 
sort result with levenstein distance. You've got an ordered list of 
suggestion.
> i.e. I want to know whether the levenshtein distance is computed at 
> query time or this information is precomputed in the index?
First lucene select candidate, after you pick the best from this list. 
Levenstein distance is only apply is only apply on few words.

M.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message