lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mathieu Lecarme <math...@garambrogne.net>
Subject Re: shingles and punctuations
Date Sun, 06 Apr 2008 20:43:25 GMT
I'll use Token flags to specifiy first token in a sentence, but how  
it's works? how flag collision is avoided? to keep it simple, i'll  
take 1 as flag, but what happens if an other filter use the same flags?

M.

Le 6 avr. 08 à 20:13, Grant Ingersoll a écrit :
> I think you need sentence detection to take place further upstream.   
> Then you could use the Token type or Token flags to indicate  
> punctuation, sentences, whatever and we could patch the shingle  
> filter to ignore these things, or break and move onto the next one.
>
> -Grant
>
> On Apr 6, 2008, at 7:23 PM, Mathieu Lecarme wrote:
>
>> The newly ShingleFilter is very helpful to fetch group of words,  
>> but it doesn't handle ponctuation or any separation.
>> If you feed it with multiple sentences, you will get shingle that  
>> start in one sentences and end in the next.
>> In order to avoid that, you can handle token positions, if there is  
>> more than one char with the previous token, it should be punctation  
>> (or typo).
>> Any suggestions to handle only shingle in the same sentence?
>>
>> M.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message