Hi Christian,
You will need to create your own Tokenizer.
Use the StandardTokenizer.jj file as a guide and instead of using a tokens
like
// basic word: a sequence of digits & letters
<ALPHANUM: (<LETTER>|<DIGIT>)+ >
Use
<ALPHAONLY: (<LETTER>)+>
And
<NUMONLY: (<DIGIT>)+>
I don't know what your patterns are, but this will help you out.
Also, you may have to change the QueryParser.jj to do the same thing.
--Peter
On 5/29/02 2:19 AM, "Christian Schrader" <schrader.news@evendi.de> wrote:
> I need to construct a Tokenizer that tokenizes at word/number boundaries, so
> that "IBM Deskstar IC35L060AVER07" would result in the following tokens:
> IBM
> Deskstar
> IC
> 35
> L
> 060
> AVER
> 07
>
> Has anybody solved this with the StandardTokenizer?
>
> Christian
>
>
> --
> To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
>
>
--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
|