lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: shingles and punctuations
Date Sun, 06 Apr 2008 18:13:25 GMT
I think you need sentence detection to take place further upstream.   
Then you could use the Token type or Token flags to indicate  
punctuation, sentences, whatever and we could patch the shingle filter  
to ignore these things, or break and move onto the next one.


On Apr 6, 2008, at 7:23 PM, Mathieu Lecarme wrote:

> The newly ShingleFilter is very helpful to fetch group of words, but  
> it doesn't handle ponctuation or any separation.
> If you feed it with multiple sentences, you will get shingle that  
> start in one sentences and end in the next.
> In order to avoid that, you can handle token positions, if there is  
> more than one char with the previous token, it should be punctation  
> (or typo).
> Any suggestions to handle only shingle in the same sentence?
> M.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message