lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <>
Subject Re: How do you see if a tokenstream has tokens without consuming the tokens ?
Date Tue, 18 Oct 2011 08:57:15 GMT
On 18/10/2011 06:19, Steven A Rowe wrote:On 18/10/2011 06:19, Steven A 
Rowe wrote:
> Hi Paul,
> You could add a rule to the StandardTokenizer JFlex grammar to handle 
> this case, bypassing its other rules.
Hmm, dont really understand jflex, but that is a possibility, but would 
prefer to do in Java code unless easy to use jflex
> Another option is to create a char filter that substitutes 
> PUNCT-EXCLAMATION for exclamation points, PUNCT-PERIOD for periods, etc.,

Yes that is how I first did it
> but only when the entire input consists exclusively of whitespace and 
> punctuation.

but I couldnt work out how to only do it when exclusively whitespace and 
punctuation, any ideas to sole that _
>   These symbols would then be left intact by StandardTokenizer.
> Steve

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message