lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: ShingleFilter with outputUnigrams=false
Date Fri, 08 Jan 2010 21:29:08 GMT
You can find the issue for this here
https://issues.apache.org/jira/browse/LUCENE-2199

On Fri, Jan 8, 2010 at 8:53 PM, Simon Willnauer
<simon.willnauer@googlemail.com> wrote:
> This is truly a bug. The outputUnigram internally only works if you
> request bi-grams.
> If the outputUnigram is set to false the filter increment the
> shingleposition by one and therefore skips every even shingle. The
> position should only be incremented if shingleBufferPosition %
> maxShingle == 0
>
> I have a test and the fix - will open an issue soon.
>
> simon
>
> On Fri, Jan 8, 2010 at 7:48 PM, Chris Hostetter
> <hossman_lucene@fucit.org> wrote:
>>
>> : I am using lucene 2.9.1 and I was trying to understand the ShingleFilter and wrote
the code below.
>>        ...
>> : I was expecting the output as follows with maxShingleSize=3 and outputUnigrams=false
:
>>        ...
>> : Am I missing something or this is the expected behavior?
>>
>> I'm not very familiar with ShingleFilter, and i'n not 100% sure i
>> understand the example you describe, but it *seems* like there may be a
>> bug here ... the easieest way to verify that is if you could tweak your
>> example code into the form of a (failing) JUnit test and open a new Jira
>> issue -- then other devs (who know more about SHingleFilter) could look at
>> it and either verify that there is a bug, or point out what's invalid
>> about hte test.
>>
>>
>>
>> -Hoss
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message