lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3961) LimitTokenCountFilterFactory config parsing is totally broken
Date Wed, 17 Oct 2012 23:14:03 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478487#comment-13478487
] 

Robert Muir commented on SOLR-3961:
-----------------------------------

{quote}
HunspellStemFilterTest is the only lucene/analysis test i see using setEnableChecks
{quote}

It sets it to true, which is dead code (true is the default!).

{quote}
although there do seem to be some highlighter tests that use it
{quote}

Highlighter has a built-in limiter, that limits not based on tokencount, but accumulated #
of analyzed chars. 
So it disables it for the same reason as LimitTokenCount does or should

{quote}
2) i don't see any existing tests for LimitTokenCountFilter .. were they deleted by mistake?
{quote}

I think these are in TestLimitTokenCountAnalyzer? For lucene users this is the way you use
this (just wrap your analyzer).

{quote}
3) the closest thing i see to a test of LimitTokenCountFilter is TestLimitTokenCountAnalyzer
- i realize now the reason it's testLimitTokenCountAnalyzer doesn't get the same failure is
because it's wrapping WhitespaceAnalyzer, StandardAnalyzer - should those be changed to use
MockTokenizer?
{quote}

I think we should always do this!

{quote}
4) TestLimitTokenCountAnalyzer also has a testLimitTokenCountIndexWriter that uses MockAnalyzer
w/o calling setEnableChecks(false) which seems like it should trigger the same failure i got
since it uses MockTokenizer, but in general that test looks suspicious, as it seems to add
the exact number of tokens that the limit is configured for, and then asserts that the last
token is in the index - but never actaully triggers the limiting logic since exactly the allowed
umber of tokens are used.
{quote}

Then thats fine, because when LimitTokenCountFilter consumes the whole stream, its a "good
consumer". its only when it actually truncates that it breaks the tokenstream contract.




                
> LimitTokenCountFilterFactory config parsing is totally broken
> -------------------------------------------------------------
>
>                 Key: SOLR-3961
>                 URL: https://issues.apache.org/jira/browse/SOLR-3961
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.0
>            Reporter: Hoss Man
>            Assignee: Hoss Man
>             Fix For: 4.0.1, 4.1
>
>         Attachments: SOLR-3961.patch, SOLR-3961.patch
>
>
> As noted on the mailing list, LimitTokenCountFilterFactory throws a NumberFormatException
because it tries to use the value of it's config param as a key to look up another param that
it parses as an integer ... totally ridiculous.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message