lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Rowe (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers
Date Tue, 14 Feb 2012 23:45:59 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208129#comment-13208129
] 

Steven Rowe commented on LUCENE-3731:
-------------------------------------

Hi Tommaso,

I just committed modifications to the IntelliJ IDEA and Maven configurations.

Something strange is happening, though: one test method consistently fails under both IntelliJ
and Maven: {{UIMABaseAnalyzerTest.testRandomStrings()}}.  However, under Ant, this always
succeeds, including with the seeds that fail under either IntelliJ or Maven.  Also, under
both IntelliJ and Maven, the following sequence is printed out literally thousands of times
to STDERR (with increasing time stamps) - however, I don't see this at all under Ant:

{noformat}
Feb 14, 2012 6:34:18 PM WhitespaceTokenizer initialize
INFO: "Whitespace tokenizer successfully initialized"
Feb 14, 2012 6:34:18 PM WhitespaceTokenizer typeSystemInit
INFO: "Whitespace tokenizer typesystem initialized"
Feb 14, 2012 6:34:18 PM WhitespaceTokenizer process
INFO: "Whitespace tokenizer starts processing"
Feb 14, 2012 6:34:18 PM WhitespaceTokenizer process
INFO: "Whitespace tokenizer finished processing"
{noformat}

Here are two different example failures, from Maven - they seem to have different causes,
which is baffling:

{noformat}
The following exceptions were thrown by threads:
*** Thread: Thread-1 ***
java.lang.RuntimeException: java.io.IOException: org.apache.uima.analysis_engine.AnalysisEngineProcessException:
Annotator processing failed.    
        at org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:289)
Caused by: java.io.IOException: org.apache.uima.analysis_engine.AnalysisEngineProcessException:
Annotator processing failed.    
        at org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer.incrementToken(UIMAAnnotationsTokenizer.java:73)
        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:333)
        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:295)
        at org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:287)
Caused by: org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing
failed.    
        at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:391)
        at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295)
        at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567)
        at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.&lt;init&gt;(ASB_impl.java:409)
        at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)
        at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
        at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
        at org.apache.lucene.analysis.uima.BaseUIMATokenizer.analyzeInput(BaseUIMATokenizer.java:57)
        at org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer.analyzeText(UIMAAnnotationsTokenizer.java:61)
        at org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer.incrementToken(UIMAAnnotationsTokenizer.java:71)
        ... 3 more
Caused by: java.lang.NullPointerException
        at org.apache.uima.impl.UimaContext_ImplBase$ComponentInfoImpl.mapToSofaID(UimaContext_ImplBase.java:655)
        at org.apache.uima.cas.impl.CASImpl.getView(CASImpl.java:2646)
        at org.apache.uima.jcas.impl.JCasImpl.getView(JCasImpl.java:1415)
        at org.apache.uima.examples.tagger.HMMTagger.process(HMMTagger.java:250)
        at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
        at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)
        ... 12 more
*** Thread: Thread-2 ***
java.lang.AssertionError: token 0 does not exist
        at org.junit.Assert.fail(Assert.java:93)
        at org.junit.Assert.assertTrue(Assert.java:43)
        at org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:121)
        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:371)
        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:295)
        at org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:287)
NOTE: reproduce with: ant test -Dtestcase=UIMABaseAnalyzerTest -Dtestmethod=testRandomStrings(org.apache.lucene.analysis.uima.UIMABaseAnalyzerTest)
-Dtests.seed=-1dad4a7ede576939:-f9f5c77dffb3eb0:607bf59bf7da50eb -Dargs=&quot;-Dfile.encoding=Cp1252&quot;
{noformat}

{noformat}
The following exceptions were thrown by threads:
*** Thread: Thread-5 ***
java.lang.RuntimeException: java.io.IOException: org.apache.uima.analysis_engine.AnalysisEngineProcessException
        at org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:289)
Caused by: java.io.IOException: org.apache.uima.analysis_engine.AnalysisEngineProcessException
        at org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer.incrementToken(UIMAAnnotationsTokenizer.java:73)
        at org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:121)
        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:371)
        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:295)
        at org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:287)
Caused by: org.apache.uima.analysis_engine.AnalysisEngineProcessException
        at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:701)
        at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.&lt;init&gt;(ASB_impl.java:409)
        at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)
        at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
        at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
        at org.apache.lucene.analysis.uima.BaseUIMATokenizer.analyzeInput(BaseUIMATokenizer.java:57)
        at org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer.analyzeText(UIMAAnnotationsTokenizer.java:61)
        at org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer.incrementToken(UIMAAnnotationsTokenizer.java:71)
        ... 4 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 2
        at java.util.ArrayList.RangeCheck(ArrayList.java:547)
        at java.util.ArrayList.get(ArrayList.java:322)
        at org.apache.uima.flow.impl.FixedFlowController$FixedFlowObject.next(FixedFlowController.java:222)
        at org.apache.uima.analysis_engine.asb.impl.FlowContainer.next(FlowContainer.java:100)
        at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:546)
        ... 11 more
*** Thread: Thread-7 ***
java.lang.RuntimeException: java.io.IOException: org.apache.uima.analysis_engine.AnalysisEngineProcessException
        at org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:289)
Caused by: java.io.IOException: org.apache.uima.analysis_engine.AnalysisEngineProcessException
        at org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer.incrementToken(UIMAAnnotationsTokenizer.java:73)
        at org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:121)
        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:371)
        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:295)
        at org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:287)
Caused by: org.apache.uima.analysis_engine.AnalysisEngineProcessException
        at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:701)
        at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.&lt;init&gt;(ASB_impl.java:409)
        at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)
        at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
        at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
        at org.apache.lucene.analysis.uima.BaseUIMATokenizer.analyzeInput(BaseUIMATokenizer.java:57)
        at org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer.analyzeText(UIMAAnnotationsTokenizer.java:61)
        at org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer.incrementToken(UIMAAnnotationsTokenizer.java:71)
        ... 4 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 2
        at java.util.ArrayList.RangeCheck(ArrayList.java:547)
        at java.util.ArrayList.get(ArrayList.java:322)
        at org.apache.uima.flow.impl.FixedFlowController$FixedFlowObject.next(FixedFlowController.java:222)
        at org.apache.uima.analysis_engine.asb.impl.FlowContainer.next(FlowContainer.java:100)
        at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:546)
        ... 11 more
*** Thread: Thread-6 ***
java.lang.AssertionError: end of stream
        at org.junit.Assert.fail(Assert.java:93)
        at org.junit.Assert.assertTrue(Assert.java:43)
        at org.junit.Assert.assertFalse(Assert.java:68)
        at org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:148)
        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:371)
        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:295)
        at org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:287)
*** Thread: Thread-4 ***
org.junit.ComparisonFailure: term 8 expected:&lt;-[]&gt; but was:&lt;-[- f(]&gt;
        at org.junit.Assert.assertEquals(Assert.java:125)
        at org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:124)
        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:371)
        at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:295)
        at org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:287)
NOTE: reproduce with: ant test -Dtestcase=UIMABaseAnalyzerTest -Dtestmethod=testRandomStrings(org.apache.lucene.analysis.uima.UIMABaseAnalyzerTest)
-Dtests.seed=2be0c24a1df9b25e:-42f203968285c6ed:5f8c85cdbae32724 -Dargs=&quot;-Dfile.encoding=Cp1252&quot;
{noformat}
                
> Create a analysis/uima module for UIMA based tokenizers/analyzers
> -----------------------------------------------------------------
>
>                 Key: LUCENE-3731
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3731
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Tommaso Teofili
>            Assignee: Tommaso Teofili
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, LUCENE-3731_3.patch, LUCENE-3731_4.patch
>
>
> As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored out in a
separate module (modules/analysis/uima) as they can be used in plain Lucene. Then the solr/contrib/uima
will contain only the related factories.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message