lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christoph Kiehl" ...@sulu3000.de>
Subject Re: [PATCH] Refactoring QueryParser.jj, setLowercaseWildcardTerms()
Date Thu, 13 Feb 2003 17:05:34 GMT
D.L.B. wrote:

> Given that this is the case, I don't think it's possible to come up
> with a solution that will cover every case.  That said, I believe it
> is still worthwhile to try to do something reasonable to cover most
> cases.
>
> The company I work for has public text searchable websites in the
> following languages: English, Danish, Spanish, French, Dutch,
> Norwegian, Finnish, and Swedish.  The approach we took, as I
> mentioned in an earlier mail, was to only stem prefix and "suffix"
> queries (of the form *someText).  In these cases, don't pass the
> wildcard character to the stemmer and only use the stemmed result if
> it is a single word.
> [...]
> It turns out that this wildcard policy works well for us -- the users
> tend to get the results they expect.  Whatever solution falls out of
> this argument, I just wanted to mention what is working for us.  I'm
> thinking that adding a suffix term notion, parallel to prefix term in
> QueryParser.jj, creating subclassable methods to handle these, maybe
> providing a subclass that performs the imperfect stemming solution
> mentioned above, might be enough to please a lot of users.

This might be the way to go. Perhaps we could extend this, and provide a
special flag like "%men*" or simply enclose the query in "" to prevent
automatic stemming. This way one would also be able to find "menigitis".

Christoph




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message