lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tsuraan <tsur...@gmail.com>
Subject Re: Customer TokenFilter
Date Thu, 27 May 2010 20:37:54 GMT
> I'd like to have all my queries and terms run through Unicode
> Normalization prior to being executed/indexed.  I've been using the
> StandardAnalyzer with pretty good luck for the past few years, so I
> think I'd like to write an analyzer that wraps that, and tacks a
> custom TokenFilter onto the chain provided by the StandardAnalyzer.
> I'm really not clear, though, on how to write a TokenFilter.  My best
> guess is that I want to write a class that overrides getAttribute, and
> uses java.text.Normalizer to normalize any TermAttribute that is
> returned from the upstream filter.  Is that correct, or should I put
> my normalization somewhere else?  Are there any docs on making custom
> filters/analyzers?  I didn't have much luck finding any.

Ok, I think that's probably the wrong approach, and I did something
different by imitating the LowerCaseFilter.  If somebody could take a
look at what I've put up at
http://github.com/tsuraan/StandardNormalizingAnalyzer, and tell me if
there's something horrible about what I've done, I'd really appreciate
it.  It passes the small unit tests I've made, but I'd really like to
know if there's something glaringly wrong about my approach.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message