uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Thibault <david.r.thiba...@gmail.com>
Subject StringIndexOutOfBoundsException using Solrcas
Date Thu, 03 Feb 2011 17:06:40 GMT
Hello all,

First off, I apologize for sending this to both the user and dev lists, but
I'm not sure which list should get it.  This is my first email to either
list.

I am working with UIMA and Solrcas and I'm getting this error:
org.apache.uima.analysis_engine.AnalysisEngineProcessException
    at
org.apache.uima.solrcas.SolrCASConsumer.process(SolrCASConsumer.java:138)
    at
org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
    at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)
    at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295)
    at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
    at
org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:897)
    at
org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:577)
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of
range: -1
    at java.lang.String.substring(String.java:1931)
    at
org.apache.uima.jcas.tcas.Annotation.getCoveredText(Annotation.java:119)
    at
org.apache.uima.solrcas.SolrCASConsumer.process(SolrCASConsumer.java:126)
    ... 6 more
org.apache.uima.analysis_engine.AnalysisEngineProcessException
    at
org.apache.uima.solrcas.SolrCASConsumer.process(SolrCASConsumer.java:138)
    at
org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
    at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)
    at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295)
    at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
    at
org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:897)
    at
org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:577)
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of
range: -1
    at java.lang.String.substring(String.java:1931)
    at
org.apache.uima.jcas.tcas.Annotation.getCoveredText(Annotation.java:119)
    at
org.apache.uima.solrcas.SolrCASConsumer.process(SolrCASConsumer.java:126)
    ... 6 more

I edited SolrCASConsumer with the following lines right before line 126:
       Annotation fsTemp = (Annotation) fs;
       System.out.println("Processing Annotation: " + fsTemp.toString());

Therefore, now right before it calls fs.getCoveredText() it prints this:
Processing Annotation: Phrase
   sofa: _InitialView
   begin: -1
   end: 60
   candidates: FSArray
   mappings: FSArray

Therefore, it's obvious why it's saying the string index is out of bounds.
However, I'm not sure why it's getting those values from my analysis
engine.  I'm using MetaMapAEApi from the NIH's MetaMap project.

This is the first phrase it is processing on this document and the first
time int prints that subsection of debug tex.  If I use the same AE in
DocumentAnalyzer it correctly shows the first Document as starting on
position 0 and ending on position 191, with the first phrase as being from
positions 0 to 7.

I'm trying to run this in the CPE GUI with the following CPEDescriptor.xml:
<?xml version="1.0" encoding="UTF-8"?>
<cpeDescription xmlns="http://uima.apache.org/resourceSpecifier">
    <collectionReader>
        <collectionIterator>
            <descriptor>
                <import
location="../../../../../../../usr/local/apache-uima/examples/descriptors/collection_reader/FileSystemCollectionReader.xml"/>
            </descriptor>
            <configurationParameterSettings>
                <nameValuePair>
                    <name>InputDirectory</name>
                    <value>

<string>/Users/davidt/Documents/workspace/BioSearch/resources/test_input</string>
                    </value>
                </nameValuePair>
            </configurationParameterSettings>
        </collectionIterator>
    </collectionReader>
    <casProcessors casPoolSize="3" processingUnitThreadCount="1">
        <casProcessor deployment="integrated" name="MetaMapApiAE">
            <descriptor>
                <import location="../../../MetaMap UIMA
Annotator/descriptors/MetaMapApiAE.xml"/>
            </descriptor>
            <deploymentParameters/>
            <errorHandling>
                <errorRateThreshold action="terminate" value="0/1000"/>
                <maxConsecutiveRestarts action="terminate" value="30"/>
                <timeout max="100000" default="-1"/>
            </errorHandling>
            <checkpoint batch="10000" time="1000ms"/>
            <configurationParameterSettings>
                <nameValuePair>
                    <name>tempdir_path</name>
                    <value>
                        <string>/Users/davidt/tmp</string>
                    </value>
                </nameValuePair>
            </configurationParameterSettings>
        </casProcessor>
        <casProcessor deployment="integrated" name="SolrcasAE.xml">
            <descriptor>
                <import
location="../../../Apache_UIMA_Sandbox/Solrcas/desc/SolrcasAE.xml"/>
            </descriptor>
            <deploymentParameters/>
            <errorHandling>
                <errorRateThreshold action="terminate" value="0/1000"/>
                <maxConsecutiveRestarts action="terminate" value="30"/>
                <timeout max="100000" default="-1"/>
            </errorHandling>
            <checkpoint batch="10000" time="1000ms"/>
        </casProcessor>
    </casProcessors>
    <cpeConfig>
        <numToProcess>-1</numToProcess>
        <deployAs>immediate</deployAs>
        <checkpoint batch="0" time="300000ms"/>
        <timerImpl/>
    </cpeConfig>
</cpeDescription>

I'm at a loss as to where that -1 is coming from or how to debug it
further.  Any ideas would be greatly appreciated.

Best,
Dave

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message