lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From smokey <smokey...@gmail.com>
Subject Re: Applying SpellChecker to a phrase
Date Tue, 04 Dec 2007 14:54:32 GMT
Thanks for the information on o.a.l.search.spans.

I was thinking of parsing the phrase query string into a sequence of terms,
then constructing a phrase query object using add(Term term, int position)
method in org.apache.lucene.search.PhraseQuery class. Then I can inject
similar words (suggested by SpellChecker) at appropriate positions for each
term as I construct the final phrase query object.

Do you agree that this should work too?

On Dec 4, 2007 1:22 AM, Doron Cohen <DORONC@il.ibm.com> wrote:

> See below -
>
> smokey <smokeystu@gmail.com> wrote on 03/12/2007 05:14:23:
>
> > Suppose I have an index containing the terms impostor,
> > imposter, fraud, and
> > fruad, then presumably regardless of whether I spell impostor and fraud
> > correctly, Lucene SpellChecker will offer the improperly
> > spelled versions as
> > corrections. This means that the phrase "The login fraud involves an
> > impostor" would need to expand to:
> >
> > "The login fraud involves an impostor" OR "The login fruad involves an
> > impostor" OR "The login fraud involves an imposter" OR "The login fruad
> > involves an imposter" to cover all cases and thus find all
> > possible matches.
> >
> > However, that feels like an aweful a lot of matches to perform
> > on the index.
> > A more efficient approach would be to expand the query to "The
> > login (fraud
> > OR fruad) involves an (impostor OR imposter)", which should be logically
> > equivalent to the first (longer) query.
> >
> > So my question is
> > (1) if others have generated the "The login (fraud OR fruad) involves an
> > (impostor OR imposter)" types of queries when applying SpellChecker to a
> > phrase, and agreed that this indeed performs better than the first one.
> > (2) if others have observed any problems in doing so in terms
> > of performance
> > or anything else
> >
> > Any information would be appreciated.
>
> Lucene phrase query does not support 'sub parts'. But you may
> want to look at o.a.l.search.spans. It seems that a span-near query
> made of span-term queries and span-or queries, setting (max)span as
> ~the length of your phrase and setting in-order=true would get
> pretty close.
>
> About performance I hope others can comment, cause I never compared
> this to phrase query. When you do try this, please tell us of any
> interesting performance results!
>
> Regards,
> Doron
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message