lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Rowe <sar...@syr.edu>
Subject RE: How do you see if a tokenstream has tokens without consuming the tokens ?
Date Wed, 19 Oct 2011 14:14:52 GMT
Hi Paul,

On 10/19/2011 at 5:26 AM, Paul Taylor wrote:
> On 18/10/2011 15:25, Steven A Rowe wrote:
> > On 10/18/2011 at 4:57 AM, Paul Taylor wrote:
> > > On 18/10/2011 06:19, Steven A Rowe wrote:
> > > > Another option is to create a char filter that substitutes
> > > > PUNCT-EXCLAMATION for exclamation points, PUNCT-PERIOD for periods,
> > > > etc.,
> > >
> > > Yes that is how I first did it
> >
> > No, I don't think you did.  When I say "char filter" I'm referring to
> > CharFilter [snip]
>
> If you look at the code you can see I do use a CharFilter: [snip]

I apologize, you're obviously right, I hadn't looked at your code. 

> > If you go with a CharFilter, you can give it access to the entire input
> > at once, and use a regular expression (or something like it) to assess
> > the input and then behave accordingly.
>
> Well this is the problem, you cant use a regular expression or even if
> you did would that really slow things down wouldn't it, seeing as 99%
> dont need the transformation.

PatternReplaceCharFilter might do the trick - maybe worth a test to see if it's performant
enough?

Steve
Mime
View raw message