incubator-ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Bethard <steven.beth...@Colorado.EDU>
Subject Re: adding the relation extractor aggregate to the regression test
Date Wed, 20 Mar 2013 12:37:25 GMT
On Mar 19, 2013, at 10:59 AM, "Masanz, James J." <> wrote:
> For #2, the other consideration is that unlike the models, the UMLS resources are not
available under the ALv2, so they are not going to end up in the ASF repo even if we decide
on option 2 for the "[DISCUSS] Where should cTAKES models live?" thread
> Because the UMLS resources will not end up in the ASF repo, for these regression test
CPEs, instead of using DictionaryLookupAnnotatorUMLS.xml I think we are going to have to use
> Here are few of the terms I remember offhand that are in the 'toy' dictionary used by

> ctakes-dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotator.xml

I guess my concern here is that the whole point of running a regression test is to make sure
that the descriptor that we ship actually works. If we're not testing the real descriptor
we expect people to use, then perhaps we shouldn't include that real descriptor at all?

Or did you envision some other process that would also test the real descriptor?


>  knee 
>  pain
>  aspirin
> -- James
>> -----Original Message-----
>> From:
>> []
>> On Behalf Of Chen, Pei
>> Sent: Saturday, March 16, 2013 8:52 AM
>> To: <>
>> Cc:
>> Subject: Re: adding the relation extractor aggregate to the regression
>> test
>> The intended behavior of the regression test is to verify that new code
>> didn't break existing functionality, so yes the xml output should be the
>> same from previous runs. If there are expected changes, they should just
>> be manually verified and rerecorded.  This should supplement any unit
>> tests but not replace it. It's a 20000ft test that a pipeline still works
>> as expected and not really intended to replace specific logically tests.
>> It's a starting point- we can certainly add more or improve it.  Both in
>> terms of adding more unit tests as well as regression.
>> 2). Yes.  We'll need to add UMLS resources if they are to be tested. Open
>> to ideas and volunteers as I didn't get to that point yet :)
>> Sent from my iPhone
>> On Mar 16, 2013, at 8:48 AM, "Steven Bethard"
>> <steven.bethard@Colorado.EDU> wrote:
>>> On Mar 15, 2013, at 9:26 PM, "Pei Chen (JIRA)" <> wrote:
>>>> If you have spare time, do you want to also try adding the relation
>> extractor aggregate to the regression test?  And having this (pipeline as
>> well as the xml desc configuration) automatically tested in the future?
>>>> It should be as simple as adding a CPE to the directory.
>>>> /ctakes-regression-test/desc/collection_processing_engine/
>>>> Take a look at
>> test/desc/collection_processing_engine/CoreferenceCPETest.xml
>>>> For example:
>>>> 1)    Just clone and point to the CPE to ../../../ctakes-relation-
>> extractor/desc/analysis_engine/RelationExtractorAggregate.xml instead .
>>>> 2)    Run mvn test once (it should probably fail because there is
>> nothing to compare with, but just collect the generated results).
>>>> 3)    Copy the results from generatedoutput/{NameofCPEFilename}/ into
>> expectedoutput/{NameofCPEFilename}
>>>> 4)    Check the expectedoutput into SVN.
>>>> 5)    Now Every time mvn test is run, that CPE will executed and
>> results compared automatically.
>>> First, a general comment about the regression test, and then some
>> details about where I'm currently stuck.
>>> (1) Is it really a good idea to be asserting that the XML files
>> generated by cTAKES components should always be identical? Particularly if
>> the current components make some mistakes, shouldn't we only be asserting
>> the things that they get right? Something more along the lines of
>> where we have individual assertions for each thing the relation extractor
>> should have found?
>>> (2) In trying to add the CPETest, I got stuck trying to get ctakes-
>> dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml
>> to work. (This descriptor is referenced by ctakes-relation-
>> extractor/desc/analysis_engine/RelationExtractorPreprocessor.xml.) Here's
>> the error I'm getting:
>>> org.apache.uima.resource.ResourceInitializationException: Initialization
>> of CAS Processor with name "RelationExtractorCPETest" failed.
>>>   at
>> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initialize
>> (
>>>   ...
>>> Caused by: org.apache.uima.resource.ResourceConfigurationException:
>> Initialization of CAS Processor with name "RelationExtractorCPETest"
>> failed.
>>>   at
>> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceIntegrated
>> CasProcessor(
>>>   ...
>>> Caused by: org.apache.uima.resource.ResourceInitializationException
>>>   at
>> org.apache.ctakes.core.resource.LuceneIndexReaderResourceImpl.load(LuceneI
>>>   ...
>>> Caused by:
>> org/apache/ctakes/dictionary/lookup/rxnorm_index
>>>   at
>> org.apache.ctakes.core.resource.FileLocator.locateExplicitly(FileLocator.j
>> ava:69)
>>>   at
>> org.apache.ctakes.core.resource.FileLocator.locateFile(
>> )
>>>   at
>> org.apache.ctakes.core.resource.LuceneIndexReaderResourceImpl.load(LuceneI
>>>   ... 53 more
>>> I assume this is because the UMLS indexes aren't in SVN anymore. What's
>> the proper way to reference these now, and should
>> DictionaryLookupAnnotatorUMLS.xml be updated appropriately?
>>> Thanks,
>>> Steve

View raw message