lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Rowe <sar...@gmail.com>
Subject Re: Configure Shingle Filter to ignore ngrams made of tokens with same start and end
Date Fri, 03 May 2013 20:39:52 GMT
An issue exists for this problem: https://issues.apache.org/jira/browse/LUCENE-3475

On May 3, 2013, at 11:00 AM, Walter Underwood <wunder@wunderwood.org> wrote:

> The shingle filter should respect positions. If it doesn't, that is worth filing a bug
so we know about it.
> 
> wunder
> 
> On May 3, 2013, at 10:50 AM, Jack Krupansky wrote:
> 
>> In short, no. I don't think you want to use the shingle filter on a token stream
that has multiple tokens at the same position, otherwise, you will get confused "suggestions",
as you've encountered.
>> 
>> -- Jack Krupansky
>> 
>> -----Original Message----- From: Rounak Jain
>> Sent: Friday, May 03, 2013 7:34 AM
>> To: solr-user@lucene.apache.org
>> Subject: Configure Shingle Filter to ignore ngrams made of tokens with same start
and end
>> 
>> Hello,
>> 
>> I was using Shingle Fitler with Suggester to implement an autosuggest
>> dropdown. The field I'm using with shingle filter has a worddelimiter with
>> preserveoriginal=1 to tokenize "women's" as "women's" and "womens."
>> 
>> Because of this, when shingle filter is generating word ngrams, apart from
>> the expected tokens, there's also a "women's womens" tokens. I wanted to
>> know if there's any way to configure ShingleFilter so that it ignores
>> tokens with same start and end values.
>> 
>> Thanks,
>> Rounak 
> 
> 
> 
> 


Mime
View raw message