lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: I need 'cat???' to match 'cat' again!
Date Thu, 07 Jun 2007 13:16:38 GMT
Well, what you're really doing, in your example, is searching
on all the terms that start with cat and are less than 7 characters
long.

So it seems to me that you can pick out terms yourself and assemble
your own bit OR clause rather than rely on Lucene's old behavior.

By that, I mean use a WildcardTermEnum on cat*. As you enumerate
over all the terms, add the term to an OR clause if it's less than
7 characters long.

Really, this is what was happening under the covers before, but
since the behavior has changed (actually been corrected), you probably
can emulate it like this.

Hope this helps
Erick

On 6/7/07, Tim Smith <forusewith@yahoo.com> wrote:
>
> Hi!
>
> The situation is the following:
>
> In my native language, stemming is not available for
> lucene AFAIK. ..and there are pleny of forms of words
> that needs to be stemmed. People usually search with
> '*' at the end of the word because of the reason, but
> because of the nature of our language, in most of the
> cases, it expands to much more words than acceptable.
> We've limited the number of Query clauses because of
> performance reasons.
> That's where '???' comes in picture. We were running
> on 1.4.x until now and our Help and FAQ suggests for
> people using '???' for stemming, since '*' will work
> only for long words (fewer expanded variations). (our
> stems are mostly 1-2 or 3 chars long)
> '???' was almost perfect for this reason. It returned
> the original word and most of the stemmed variations.
> Now it is completly broken, so we need to find a
> solution.
> I know this is a special case for a special language,
> but this is a real problem for us now.
>
> Thanks,
> Tim
>
> --- Erick Erickson <erickerickson@gmail.com> wrote:
>
> > Well, having your application depend upon incorrect
> > behavior
> > is...er...fraught.
> >
> > It looks like what you really want is custom
> > behavior for multiple
> > question marks, perhaps only with multiple question
> > marks
> > at the end of your query?
> >
> > If this is the case, I'd think about substituting
> > splat (*) in this
> > case at query time. So you simply transform
> > cat??? to cat*....
> >
> > If that doesn't satisfy your requirements, perhaps
> > you could
> > post a more detailed explanation of what you're
> > trying to
> > accomplish.
> >
> > Best
> > Erick
> >
> > On 6/6/07, Tim Smith <forusewith@yahoo.com> wrote:
> > >
> > > Hi!
> > >
> > > How can I restore the behavior of the old
> > > WildcardQuery under 2.1?
> > > I badly need 'cat???' to match 'cat' again just
> > like
> > > in the older versions.
> > >
> > > I could modify my istance of lucene by removing
> > those
> > > "new" lines, but I don't want to maintain a custom
> > > lucene package.
> > >
> > > Please help!
> > >
> > > Tim
> > >
> > >
> > >
> > >
> > > Source: LUCENE-306
> > > >
> > >
> >
> ********************************************************************
> > > > --- WildcardTermEnum.org      2004-05-11
> > > 11:42:10.000000000 -0400
> > > > +++ WildcardTermEnum.java     2004-11-08
> > > 14:35:14.823610500 -0500
> > > > @@ -132,6 +132,10 @@
> > > >              }
> > > >              else
> > > >              {
> > > > +           //to prevent "cat" matches "ca??"
> > > > +           if(wildchar == WILDCARD_CHAR){
> > > > +             return false;
> > > > +           }
> > > >                // Look at the next character
> > > >                wildcardSearchPos++;
> > > >              }
> > > >
> > >
> >
> **********************************************************************
> > >
> > >
> > >
> > >
> > >
> > >
> >
>
> ____________________________________________________________________________________
> > > Sucker-punch spam with award-winning protection.
> > > Try the free Yahoo! Mail Beta.
> > >
> >
> http://advision.webevents.yahoo.com/mailbeta/features_spam.html
> > >
> > >
> >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> > >
> > >
> >
>
>
>
>
>
> ____________________________________________________________________________________
> Get the free Yahoo! toolbar and rest assured with the added security of
> spyware protection.
> http://new.toolbar.yahoo.com/toolbar/features/norton/index.php
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message