lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Filter before tokenize ?
Date Sat, 12 Sep 2009 19:50:09 GMT
--- On Sat, 9/12/09, Paul Taylor <> wrote:

> From: Paul Taylor <>
> Subject: Filter before tokenize ?
> To:
> Date: Saturday, September 12, 2009, 9:39 PM
> Is it possible to filter before
> tokenize, or is that not a good idea.
> I want to convert '&' to 'and' , so they are dealt with
> the same way, but the StandardTokenizer I am using removes
> the &, I could change the tokenizer but  because
> I'm not too clear on jflex syntax it would seem easier to
> just apply a CharFilter before tokenizing, but is that
> possible

May be you can use WhitespaceTokenizer that won't remove &?
Why and's (&) are import for you? Do you need to search them?
Replacing &'s before indexing (by preprocessing) can be a option?

Filter before tokenizer can be simulated by using:

2-)Your CharFilter
3-)A token filter that tokenizes input token's text using StandardTokenizer

But i think this is not a good idea.

Hope this helps.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message