lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Rowe <>
Subject RE: Can I omit ShingleFilter's filler tokens
Date Thu, 12 May 2011 17:03:56 GMT
A thought: one way to do #1 without modifying ShingleFilter: if there were a StopFilter variant
that accepted regular expressions instead of a stopword list, you could configure it with
a regex like /_ .*|.* _| _ / (assuming a full match is required, i.e. implicit beginning and
end anchors), and place it in the analysis pipeline after ShingleFilter to throw out shingles
with filler tokens in them.

(It think it would be useful to generalize StopFilter to allow for more sources of stoppage,
rather than just creating a StopRegexFilter with no relation to StopFilter.)


> -----Original Message-----
> From: Elmo Bleek []
> Sent: Thursday, May 12, 2011 12:51 PM
> To:
> Subject: Re: Can I omit ShingleFilter's filler tokens
> I have found that simply having StopFilter before ShingleFilter does the
> trick for #2. However, I have also been working on trying to accomplish
> #1,
> don't create shingles across stop words. I am currently under the
> impression
> that this will take modifying ShingleFilter. Does anyone have any
> suggestions?
> --
> View this message in context:
> omit-ShingleFilter-s-filler-tokens-tp2926009p2932604.html
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message