lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Carlson <carl...@bookandhammer.com>
Subject Re: JavaCC Tokenizer
Date Wed, 29 May 2002 14:35:16 GMT
Hi Christian,

You will need to create your own Tokenizer.
Use the StandardTokenizer.jj file as a guide and instead of using a tokens
like


  // basic word: a sequence of digits & letters
  <ALPHANUM: (<LETTER>|<DIGIT>)+ >


Use

<ALPHAONLY: (<LETTER>)+>

And

<NUMONLY: (<DIGIT>)+>


I don't know what your patterns are, but this will help you out.

Also, you may have to change the QueryParser.jj to do the same thing.

--Peter




On 5/29/02 2:19 AM, "Christian Schrader" <schrader.news@evendi.de> wrote:

> I need to construct a Tokenizer that tokenizes at word/number boundaries, so
> that "IBM Deskstar IC35L060AVER07" would result in the following tokens:
> IBM
> Deskstar
> IC
> 35
> L
> 060
> AVER
> 07
> 
> Has anybody solved this with the StandardTokenizer?
> 
> Christian
> 
> 
> --
> To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
> 
> 


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message