lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (LUCENE-1903) Incorrect ShingleFilter behavior when outputUnigrams == false
Date Tue, 08 Sep 2009 23:34:57 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752821#action_12752821
] 

Robert Muir edited comment on LUCENE-1903 at 9/8/09 4:34 PM:
-------------------------------------------------------------

here is what i think:
this is what Michael Busch said in LUCENE-1775

{quote}
ShingleFilter and ShingleFilterTest are converted to the new API.

ShingleFilter is much more efficient now, it clones much less often and computes the tokens
mostly on the fly now. 
{quote}

the fact it went to the new API appears to have made it to CHANGES, but not the fact it is
more efficient.
so maybe it could be mentioned in CHANGES not only that it went to the new API,
but that it is more efficient and that Chris & Uwe added additional tests and fixed bugs/ensured
correctness?

by the way, you can take my name off existing CHANGE if you want, I did nothing :)


      was (Author: rcmuir):
    here is what i think:
this is what Michael Busch said in LUCENE-1775

{quote}
ShingleFilter and ShingleFilterTest are converted to the new API.

ShingleFilter is much more efficient now, it clones much less often and computes the tokens
mostly on the fly now. 
{quote}

the fact it went to the new API appears to have made it to CHANGES, but not the fact it is
more efficient.
so maybe it could be mentioned in CHANGES not only that it went to the new API,
but that it is more efficient and that Chris & Uwe added additional tests and fixed bugs/ensured
correctness?
  
> Incorrect ShingleFilter behavior when outputUnigrams == false
> -------------------------------------------------------------
>
>                 Key: LUCENE-1903
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1903
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/analyzers
>    Affects Versions: 2.9
>            Reporter: Chris Harris
>             Fix For: 2.9
>
>         Attachments: LUCENE-1903.patch, LUCENE-1903_testcases.patch, LUCENE-1903_testcases_lucene2_4_1_version.patch,
TEST-org.apache.lucene.analysis.shingle.ShingleFilterTest.xml
>
>
> ShingleFilter isn't working as expected when outputUnigrams == false. In particular,
it is outputting unigrams at least some of the time when outputUnigrams==false.
> I'll attach a patch to ShingleFilterTest.java that adds some test cases that demonstrate
the problem.
> I haven't checked this, but I hypothesize that the behavior for outputUnigrams == false
got changed when the class was upgraded to the new TokenStream API?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message