lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mathieu Lecarme <>
Subject shingles and punctuations
Date Sun, 06 Apr 2008 17:23:58 GMT
The newly ShingleFilter is very helpful to fetch group of words, but  
it doesn't handle ponctuation or any separation.
If you feed it with multiple sentences, you will get shingle that  
start in one sentences and end in the next.
In order to avoid that, you can handle token positions, if there is  
more than one char with the previous token, it should be punctation  
(or typo).
Any suggestions to handle only shingle in the same sentence?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message