lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wettin <karl.wet...@gmail.com>
Subject Re: Applying SpellChecker to a phrase
Date Sat, 08 Dec 2007 14:51:39 GMT
You might want to take a look at the TokenPhraseSuggester in  
LUCENE-626. It is more or less a FuzzySpanQuery, built from a matrix  
of tokens, but places one search for each possible query out of the  
matrix (with some optional parameters to minimze the query) to find a  
score and the hits for each possible match.

In my test cases this can end up beeing thousands or tens of thousands  
of queries to find the most plausable spelling. So rather than  
applying these queries to the main index, one wants to do it against a  
small a priori index.

http://issues.apache.org/jira/browse/LUCENE-626

TokenPhraseSuggester is quite well documented in code and there is  
some about it in the package level java docs too:
http://ginandtonique.org/~kalle/javadocs/didyoumean/org/apache/lucene/search/didyoumean/package-summary.html#package_description



7 dec 2007 kl. 21.58 skrev Doron Cohen:

> smokey <smokeystu@gmail.com> wrote on 04/12/2007 16:54:32:
>
>> Thanks for the information on o.a.l.search.spans.
>>
>> I was thinking of parsing the phrase query string into a
>> sequence of terms,
>> then constructing a phrase query object using add(Term term,
>> int position)
>> method in org.apache.lucene.search.PhraseQuery class. Then I can  
>> inject
>> similar words (suggested by SpellChecker) at appropriate
>> positions for each
>> term as I construct the final phrase query object.
>>
>> Do you agree that this should work too?
>
> I never tried this but I'm sure it will not work.
>
> The phrase query scorer requires all the terms to
> appear - either at the 'right' place or with slop
> for sloppy phrases. Therefore if you inject two
> terms in the same position the scorer will require
> to find both of them in the same position in order to
> match a document. This would be an AND logic, while
> what you need is an OR logic.
>
> The phrase positions are handy for something else:
> supporting searching when stopwords where filtered
> (either at indexing or at search or both).
>
> Regards,
> Doron
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message