lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Woodward" <d...@loc.gov>
Subject Words that need protection from stemming, i.e., protwords.txt
Date Fri, 16 Jan 2009 23:20:08 GMT
Hi.

Any good protwords.txt out there?

In a fairly standard solr analyzer chain, we use the English Porter analyzer like so:

<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>

For most purposes the porter does just fine, but occasionally words come along that really
don't work out to well, e.g.,

"maine" is stemmed to "main" - clearly goofing up precision about "Maine" without doing much
good for variants of "main".

So - I have an entry for my protwords.txt. What else should go in there?

Thanks for your ideas,

Dave Woodward


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message