lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Hill <p...@metajure.com>
Subject RE: Stemming - limited index expansion
Date Tue, 12 Jun 2012 23:43:36 GMT
Thanks for the reply.

> -----Original Message-----
> From: Jack Krupansky [mailto:jack@basetechnology.com]
> Sent: Tuesday, June 12, 2012 1:14 PM
> To: java-user@lucene.apache.org
> Subject: Re: Stemming - limited index expansion
> 
> I don't completely follow precisely what you want to do, but the WordDelimiterFilter
is an example of a
> token filter that outputs an extra token at the same position, such as with its
> CATENATE_ALL/WORDS/NUMBERS options.

Thanks for directing me to that. I'm currently using 3.4., it doesn't appear in the code base
of 3.6.   
If it doesn't show up until 4.0+ (your link is actually 5.0!), I  know that
   " Terms are no longer required to be character based. Lucene views a term as an arbitrary
byte[]"
	-- https://builds.apache.org/job/Lucene-trunk/javadoc/changes/Changes.html#4.0.0-alpha.api_changes
But hopefully it at the right level to suggest how would be done using the old CharRef instead
of whatever the new stuff uses (ByteRef?).
I'll take a look.

> Maybe you simple want to internally call some existing stemmer filter and output both
the original and
> stemmed term at the same location?

Yes, that is very close to what I want to do, possibly only with the addition of only doing
stemming on a limited set of all words (but more than just plurals).

-Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message