lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Moen (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3897) KuromojiTokenizer fails with large docs
Date Wed, 21 Mar 2012 04:59:40 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234096#comment-13234096
] 

Christian Moen commented on LUCENE-3897:
----------------------------------------

Robert, your change to LUCENE-3895 is very useful.  Thanks again for this.

I can reproduce a failing case on {{trunk}} on my system using

{noformat}
ant test -Dtestcase=TestKuromojiTokenizer -Dtestmethod=testRandomHugeStrings -Dtests.seed=-42f0565412819c1e:75f7606c1595bc3f:-31754ca508d64340
-Dargs="-Dfile.encoding=MacRoman"
{noformat}

and the output is as follows:

{noformat}
    [junit] Testsuite: org.apache.lucene.analysis.kuromoji.TestKuromojiTokenizer
    [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 16.122 sec
    [junit] 
    [junit] ------------- Standard Error -----------------
    [junit] NOTE: Ignoring @nightly test method 'testBocchanBig'
    [junit] 
    [junit] ===>
    [junit] Uncaught exception by thread: Thread[Thread-4,5,main]
    [junit] java.lang.AssertionError: backPos=3076 vs lastBackTracePos=4096
    [junit] 	at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.backtrace(KuromojiTokenizer.java:907)
    [junit] 	at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.parse(KuromojiTokenizer.java:756)
    [junit] 	at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.incrementToken(KuromojiTokenizer.java:403)
    [junit] 	at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:404)
    [junit] 	at org.apache.lucene.analysis.BaseTokenStreamTestCase.access$000(BaseTokenStreamTestCase.java:49)
    [junit] 	at org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:334)
    [junit] <===
    [junit] 
    [junit] NOTE: reproduce with: ant test -Dtestcase=TestKuromojiTokenizer -Dtestmethod=null
-Dtests.seed=-42f0565412819c1e:75f7606c1595bc3f:-31754ca508d64340 -Dargs="-Dfile.encoding=MacRoman"
    [junit] ------------- ---------------- ---------------
    [junit] Testcase: testRandomHugeStrings(org.apache.lucene.analysis.kuromoji.TestKuromojiTokenizer):
Caused an ERROR
    [junit] Uncaught exception by thread: Thread[Thread-4,5,]
    [junit] org.apache.lucene.util.UncaughtExceptionsRule$UncaughtExceptionsInBackgroundThread:
Uncaught exception by thread: Thread[Thread-4,5,]
    [junit] 	at org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:66)
    [junit] 	at org.apache.lucene.util.LuceneTestCase$RememberThreadRule$1.evaluate(LuceneTestCase.java:618)
    [junit] 	at org.junit.rules.RunRules.evaluate(RunRules.java:18)
    [junit] 	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
    [junit] 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
    [junit] 	at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:164)
    [junit] 	at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
    [junit] 	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
    [junit] 	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
    [junit] 	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
    [junit] 	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
    [junit] 	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
    [junit] 	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
    [junit] 	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
    [junit] 	at org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:57)
    [junit] 	at org.apache.lucene.util.StoreClassNameRule$1.evaluate(StoreClassNameRule.java:21)
    [junit] 	at org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22)
    [junit] 	at org.junit.rules.RunRules.evaluate(RunRules.java:18)
    [junit] 	at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
    [junit] 	at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
    [junit] 	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518)
    [junit] 	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052)
    [junit] 	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906)
    [junit] Caused by: java.lang.AssertionError: backPos=3076 vs lastBackTracePos=4096
    [junit] 	at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.backtrace(KuromojiTokenizer.java:907)
    [junit] 	at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.parse(KuromojiTokenizer.java:756)
    [junit] 	at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.incrementToken(KuromojiTokenizer.java:403)
    [junit] 	at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:404)
    [junit] 	at org.apache.lucene.analysis.BaseTokenStreamTestCase.access$000(BaseTokenStreamTestCase.java:49)
    [junit] 	at org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:334)
    [junit] 
    [junit] 
    [junit] Test org.apache.lucene.analysis.kuromoji.TestKuromojiTokenizer FAILED
{noformat}
                
> KuromojiTokenizer fails with large docs
> ---------------------------------------
>
>                 Key: LUCENE-3897
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3897
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/analysis
>            Reporter: Robert Muir
>             Fix For: 3.6, 4.0
>
>
> just shoving largeish random docs triggers asserts like:
> {noformat}
>     [junit] Caused by: java.lang.AssertionError: backPos=4100 vs lastBackTracePos=5120
>     [junit] 	at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.backtrace(KuromojiTokenizer.java:907)
>     [junit] 	at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.parse(KuromojiTokenizer.java:756)
>     [junit] 	at org.apache.lucene.analysis.kuromoji.KuromojiTokenizer.incrementToken(KuromojiTokenizer.java:403)
>     [junit] 	at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:404)
> {noformat}
> But, you get no seed...
> I'll commit the test case and @Ignore it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message