lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Janssen <jans...@parc.com>
Subject Re: AW: Best practices for multiple languages?
Date Wed, 19 Jan 2011 15:29:36 GMT
Paul Libbrecht <paul@hoplahup.net> wrote:

> I did several changes of this sort and the precision and recall
> measures went better in particular in presence of language-indication
> failure which happened to be very common in our authoring environment.

There are two kinds of failures:  no language, or wrong language.

For no language, I fall back to StandardAnalyzer, so I should have
results similar to yours.  For wrong language, well, I'm using OTS
trigram-based language guessers, and they're pretty good these days.

> >> Wouldn't it be better to prefer precise matches (a field that is
> >> analyzed with StandardAnalyzer for example) but also allow matches are
> >> stemmed.

Yes, I think it might improve things, but again, by how much?  Stemming is
better than no stemming, in terms of recall.  But this approach would also
improve precision.

Bill

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message