lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tsuraan <>
Subject Customer TokenFilter
Date Wed, 26 May 2010 22:29:12 GMT
I'd like to have all my queries and terms run through Unicode
Normalization prior to being executed/indexed.  I've been using the
StandardAnalyzer with pretty good luck for the past few years, so I
think I'd like to write an analyzer that wraps that, and tacks a
custom TokenFilter onto the chain provided by the StandardAnalyzer.
I'm really not clear, though, on how to write a TokenFilter.  My best
guess is that I want to write a class that overrides getAttribute, and
uses java.text.Normalizer to normalize any TermAttribute that is
returned from the upstream filter.  Is that correct, or should I put
my normalization somewhere else?  Are there any docs on making custom
filters/analyzers?  I didn't have much luck finding any.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message