lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Smith <forusew...@yahoo.com>
Subject Re: I need 'cat???' to match 'cat' again!
Date Thu, 07 Jun 2007 06:04:00 GMT
Hi!

The situation is the following:

In my native language, stemming is not available for
lucene AFAIK. ..and there are pleny of forms of words
that needs to be stemmed. People usually search with
'*' at the end of the word because of the reason, but
because of the nature of our language, in most of the
cases, it expands to much more words than acceptable.
We've limited the number of Query clauses because of
performance reasons.
That's where '???' comes in picture. We were running
on 1.4.x until now and our Help and FAQ suggests for
people using '???' for stemming, since '*' will work
only for long words (fewer expanded variations). (our
stems are mostly 1-2 or 3 chars long)
'???' was almost perfect for this reason. It returned
the original word and most of the stemmed variations.
Now it is completly broken, so we need to find a
solution.
I know this is a special case for a special language,
but this is a real problem for us now.

Thanks,
Tim

--- Erick Erickson <erickerickson@gmail.com> wrote:

> Well, having your application depend upon incorrect
> behavior
> is...er...fraught.
> 
> It looks like what you really want is custom
> behavior for multiple
> question marks, perhaps only with multiple question
> marks
> at the end of your query?
> 
> If this is the case, I'd think about substituting
> splat (*) in this
> case at query time. So you simply transform
> cat??? to cat*....
> 
> If that doesn't satisfy your requirements, perhaps
> you could
> post a more detailed explanation of what you're
> trying to
> accomplish.
> 
> Best
> Erick
> 
> On 6/6/07, Tim Smith <forusewith@yahoo.com> wrote:
> >
> > Hi!
> >
> > How can I restore the behavior of the old
> > WildcardQuery under 2.1?
> > I badly need 'cat???' to match 'cat' again just
> like
> > in the older versions.
> >
> > I could modify my istance of lucene by removing
> those
> > "new" lines, but I don't want to maintain a custom
> > lucene package.
> >
> > Please help!
> >
> > Tim
> >
> >
> >
> >
> > Source: LUCENE-306
> > >
> >
>
********************************************************************
> > > --- WildcardTermEnum.org      2004-05-11
> > 11:42:10.000000000 -0400
> > > +++ WildcardTermEnum.java     2004-11-08
> > 14:35:14.823610500 -0500
> > > @@ -132,6 +132,10 @@
> > >              }
> > >              else
> > >              {
> > > +           //to prevent "cat" matches "ca??"
> > > +           if(wildchar == WILDCARD_CHAR){
> > > +             return false;
> > > +           }
> > >                // Look at the next character
> > >                wildcardSearchPos++;
> > >              }
> > >
> >
>
**********************************************************************
> >
> >
> >
> >
> >
> >
>
____________________________________________________________________________________
> > Sucker-punch spam with award-winning protection.
> > Try the free Yahoo! Mail Beta.
> >
>
http://advision.webevents.yahoo.com/mailbeta/features_spam.html
> >
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> java-user-help@lucene.apache.org
> >
> >
> 



       
____________________________________________________________________________________
Get the free Yahoo! toolbar and rest assured with the added security of spyware protection.
http://new.toolbar.yahoo.com/toolbar/features/norton/index.php

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message