lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] Commented: (LUCENE-1903) Incorrect ShingleFilter behavior when outputUnigrams == false
Date Tue, 08 Sep 2009 23:21:57 GMT


Robert Muir commented on LUCENE-1903:

here is what i think:
this is what Michael Busch said in LUCENE-1775

ShingleFilter and ShingleFilterTest are converted to the new API.

ShingleFilter is much more efficient now, it clones much less often and computes the tokens
mostly on the fly now. 

the fact it went to the new API appears to have made it to CHANGES, but not the fact it is
more efficient.
so maybe it could be mentioned in CHANGES not only that it went to the new API,
but that it is more efficient and that Chris & Uwe added additional tests and fixed bugs/ensured

> Incorrect ShingleFilter behavior when outputUnigrams == false
> -------------------------------------------------------------
>                 Key: LUCENE-1903
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/analyzers
>    Affects Versions: 2.9
>            Reporter: Chris Harris
>             Fix For: 2.9
>         Attachments: LUCENE-1903.patch, LUCENE-1903_testcases.patch, LUCENE-1903_testcases_lucene2_4_1_version.patch,
> ShingleFilter isn't working as expected when outputUnigrams == false. In particular,
it is outputting unigrams at least some of the time when outputUnigrams==false.
> I'll attach a patch to that adds some test cases that demonstrate
the problem.
> I haven't checked this, but I hypothesize that the behavior for outputUnigrams == false
got changed when the class was upgraded to the new TokenStream API?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message