lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Igal @" <>
Subject Removing Empty Shingles in Lucene 4
Date Thu, 01 Nov 2012 19:44:01 GMT

I'm trying to migrate to Lucene 4.

in Lucene 3.5 I extended org.apache.lucene.analysis.FilteringTokenFilter 
and overrode accept() to remove undesired shingles.  in Lucene 4 
org.apache.lucene.analysis.FilteringTokenFilter does not exist?

I'm trying to achieve two things:

1) remove shingles that have an empty item.

2) remove shingles when the phrase contains a comma, for example:

     for the phrase:    "delicious red apples, green pears, and oranges"

I want the following shingles (with a shingle size of 2):

"delicious red", "red apples", "green pears", "and oranges"
(no "apples green" because there's a comma)
(no "pears and" because there's a comma)

any ideas?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message