lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: Customer TokenFilter
Date Thu, 27 May 2010 20:47:44 GMT
Looks correct! Wrapping by CharBuffer is very intelligent! In Lucene 3.1 the
new Term Attribute will implement CharSequence, then its even simplier. You
may also look at 3.1's ICU contrib that has support even for Normalizer2.

Overriding StandardAnalyzer is the wrong way, as in 3.1 its final (its only
accidentially non-final)! You should always extend Analyzer and build the
chain yourself. Or wrap another analyzer like in PerFieldAnalyzerWrapper.
The whole Tokenization API in Lucene is based on the delegator pattern, so
never extend any class not made for it!


Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen

> -----Original Message-----
> From: tsuraan []
> Sent: Thursday, May 27, 2010 10:38 PM
> To: java-user
> Subject: Re: Customer TokenFilter
> > I'd like to have all my queries and terms run through Unicode
> > Normalization prior to being executed/indexed.  I've been using the
> > StandardAnalyzer with pretty good luck for the past few years, so I
> > think I'd like to write an analyzer that wraps that, and tacks a
> > custom TokenFilter onto the chain provided by the StandardAnalyzer.
> > I'm really not clear, though, on how to write a TokenFilter.  My best
> > guess is that I want to write a class that overrides getAttribute, and
> > uses java.text.Normalizer to normalize any TermAttribute that is
> > returned from the upstream filter.  Is that correct, or should I put
> > my normalization somewhere else?  Are there any docs on making custom
> > filters/analyzers?  I didn't have much luck finding any.
> Ok, I think that's probably the wrong approach, and I did something
> by imitating the LowerCaseFilter.  If somebody could take a look at what
> put up at, and tell
> me if there's something horrible about what I've done, I'd really
appreciate it.
> It passes the small unit tests I've made, but I'd really like to know if
> something glaringly wrong about my approach.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message