lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Janssen <>
Subject Re: AW: Best practices for multiple languages?
Date Wed, 19 Jan 2011 15:29:36 GMT
Paul Libbrecht <> wrote:

> I did several changes of this sort and the precision and recall
> measures went better in particular in presence of language-indication
> failure which happened to be very common in our authoring environment.

There are two kinds of failures:  no language, or wrong language.

For no language, I fall back to StandardAnalyzer, so I should have
results similar to yours.  For wrong language, well, I'm using OTS
trigram-based language guessers, and they're pretty good these days.

> >> Wouldn't it be better to prefer precise matches (a field that is
> >> analyzed with StandardAnalyzer for example) but also allow matches are
> >> stemmed.

Yes, I think it might improve things, but again, by how much?  Stemming is
better than no stemming, in terms of recall.  But this approach would also
improve precision.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message