lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1903) Incorrect ShingleFilter behavior when outputUnigrams == false
Date Tue, 08 Sep 2009 23:01:57 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752813#action_12752813
] 

Uwe Schindler commented on LUCENE-1903:
---------------------------------------

Chris: Could you test this patch, if it works as exspected also for you. Maybe I found a fix
only valid for your testcase but not other cases. In my opinion, the code works now identical
to the 2.4.1 one (without the output buffer). Unigrams are simply detected by shingleBufferPosition==0.
The position increments are also tested by your code and the implementation also looks right.
In principle, there was only missing the increment of the shingleBufferPosition if no unigrams
are provided.

Mark: If you want to build RC3 soon, just assign yourself and commit this fix. I will go to
bed now. I will commit this tomorrow, if you hadn't. A CHANGES.txt entry is not needed in
my opinion, as this is not a new feature or a bug from 2.4.1.

> Incorrect ShingleFilter behavior when outputUnigrams == false
> -------------------------------------------------------------
>
>                 Key: LUCENE-1903
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1903
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/analyzers
>    Affects Versions: 2.9
>            Reporter: Chris Harris
>             Fix For: 2.9
>
>         Attachments: LUCENE-1903.patch, LUCENE-1903_testcases.patch, LUCENE-1903_testcases_lucene2_4_1_version.patch,
TEST-org.apache.lucene.analysis.shingle.ShingleFilterTest.xml
>
>
> ShingleFilter isn't working as expected when outputUnigrams == false. In particular,
it is outputting unigrams at least some of the time when outputUnigrams==false.
> I'll attach a patch to ShingleFilterTest.java that adds some test cases that demonstrate
the problem.
> I haven't checked this, but I hypothesize that the behavior for outputUnigrams == false
got changed when the class was upgraded to the new TokenStream API?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message