lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera" <ser...@gmail.com>
Subject Re: matching products with suggest feature
Date Thu, 14 Feb 2008 06:44:47 GMT
Is this Speller class a Lucene class? I didn't find it in the main code
stream, maybe it's part of contrib?

Anyway, still it depends how it is implemented (OR or AND). For example,
someone indexed a document with the word "abcde" and the index keeps the
ngrams "abc", "bcd" and "cde". Then somebody types in "abc", what would the
speller suggest? What would the speller suggest for "abce"?
If it works in an OR mode, I assume it would suggest "abcde" for both, as
"abc" appears in both. But if it works in AND mode, then for the first it
will suggest "abcde" but for the second it won't suggest it because the
ngrams produced are "abc" and "bce" .. and "bce" does not appear in "abcde".

Am I right? If not, can you elaborate more on the Speller class you use?

On Wed, Feb 13, 2008 at 8:19 PM, Cam Bazz <cambazz@gmail.com> wrote:

> Hello Shai,
>
> The class that does the matching is Speller.
> It does not work query based but rather there is a method called -
> suggestSimilar(String word, int numSug); where the numSug is number of
> suggestions. The words are kept in the index as ngrams. For example abcde
> is
> kept as abc bcd cde.
> So this is not normal query like we all know.
>
> Best regards,
> C.B.
>
>
> On Feb 13, 2008 7:00 PM, Shai Erera <serera@gmail.com> wrote:
>
> > What is the default Operator of your QueryParser? Is it AND_OPERATOR or
> > OR_OPERATOR. If it's OR ... then it's strange. If it's AND, then once
> you
> > add more terms than what exists, it won't find anything.
> >
> > On Feb 13, 2008 6:54 PM, Cam Bazz <cambazz@gmail.com> wrote:
> >
> > > Hello;
> > >
> > > I am trying to make a product matcher based on lucene's ngram based
> > > suggest.
> > > I did some changes so that instead of giving the speller a dictionary
> I
> > > feed
> > > it with a List<String>.
> > >
> > > For example lets say I have "HP NC4400 EY605EA CORE 2 DUO T5600
> > > 1.83GHz/512MB/80GB/12.1''
> > > NOTEBOOK"
> > > and I index it with speller using an ngram approach.
> > >
> > > It works quite well - when using the suggest feature, for example if
> the
> > > user submits something similar. similar as in the string lenght is
> > > relatively equal, a word or two might be mistyped - or even missing,
> > > lucene
> > > finds it.
> > > However - when the user submits the same product - but with much less
> or
> > > much more string length - for example "HP NC4400 EY605EA" or "HP
> NC4400
> > > EY605EA CORE 2 DUO T5600 1.83GHz/512MB/80GB/12.1'' NOTEBOOK WITH
> WINDOWS
> > > XP
> > > AND GIFT MOUSE" - the suggester wont work.
> > >
> > > I am not sure about the ngrams approach any more.
> > >
> > > Any ideas/recomendations/help greatly appreciated.
> > >
> > > Best Regards,
> > > C.B.
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > Shai Erera
> >
>



-- 
Regards,

Shai Erera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message