lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: ShingleFilter with outputUnigrams=false
Date Fri, 08 Jan 2010 19:53:53 GMT
This is truly a bug. The outputUnigram internally only works if you
request bi-grams.
If the outputUnigram is set to false the filter increment the
shingleposition by one and therefore skips every even shingle. The
position should only be incremented if shingleBufferPosition %
maxShingle == 0

I have a test and the fix - will open an issue soon.

simon

On Fri, Jan 8, 2010 at 7:48 PM, Chris Hostetter
<hossman_lucene@fucit.org> wrote:
>
> : I am using lucene 2.9.1 and I was trying to understand the ShingleFilter and wrote
the code below.
>        ...
> : I was expecting the output as follows with maxShingleSize=3 and outputUnigrams=false
:
>        ...
> : Am I missing something or this is the expected behavior?
>
> I'm not very familiar with ShingleFilter, and i'n not 100% sure i
> understand the example you describe, but it *seems* like there may be a
> bug here ... the easieest way to verify that is if you could tweak your
> example code into the form of a (failing) JUnit test and open a new Jira
> issue -- then other devs (who know more about SHingleFilter) could look at
> it and either verify that there is a bug, or point out what's invalid
> about hte test.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message