ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Pei" <Pei.C...@childrens.harvard.edu>
Subject Re: adding the relation extractor aggregate to the regression test
Date Sat, 16 Mar 2013 13:52:04 GMT
The intended behavior of the regression test is to verify that new code didn't break existing
functionality, so yes the xml output should be the same from previous runs. If there are expected
changes, they should just be manually verified and rerecorded.  This should supplement any
unit tests but not replace it. It's a 20000ft test that a pipeline still works as expected
and not really intended to replace specific logically tests.  
It's a starting point- we can certainly add more or improve it.  Both in terms of adding more
unit tests as well as regression. 

2). Yes.  We'll need to add UMLS resources if they are to be tested. Open to ideas and volunteers
as I didn't get to that point yet :)

Sent from my iPhone

On Mar 16, 2013, at 8:48 AM, "Steven Bethard" <steven.bethard@Colorado.EDU> wrote:

> On Mar 15, 2013, at 9:26 PM, "Pei Chen (JIRA)" <jira@apache.org> wrote:
>> If you have spare time, do you want to also try adding the relation extractor aggregate
to the regression test?  And having this (pipeline as well as the xml desc configuration)
automatically tested in the future?
>> It should be as simple as adding a CPE to the directory.
>> /ctakes-regression-test/desc/collection_processing_engine/
>> Take a look at http://svn.apache.org/repos/asf/incubator/ctakes/trunk/ctakes-regression-test/desc/collection_processing_engine/CoreferenceCPETest.xml
>> For example:
>> 1)    Just clone and point to the CPE to ../../../ctakes-relation-extractor/desc/analysis_engine/RelationExtractorAggregate.xml
instead .
>> 2)    Run mvn test once (it should probably fail because there is nothing to compare
with, but just collect the generated results).
>> 3)    Copy the results from generatedoutput/{NameofCPEFilename}/ into expectedoutput/{NameofCPEFilename}
>> 4)    Check the expectedoutput into SVN.
>> 5)    Now Every time mvn test is run, that CPE will executed and results compared
> First, a general comment about the regression test, and then some details about where
I'm currently stuck.
> (1) Is it really a good idea to be asserting that the XML files generated by cTAKES components
should always be identical? Particularly if the current components make some mistakes, shouldn't
we only be asserting the things that they get right? Something more along the lines of org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotatorsTest,
where we have individual assertions for each thing the relation extractor should have found?
> (2) In trying to add the CPETest, I got stuck trying to get ctakes-dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml
to work. (This descriptor is referenced by ctakes-relation-extractor/desc/analysis_engine/RelationExtractorPreprocessor.xml.)
Here's the error I'm getting:
> org.apache.uima.resource.ResourceInitializationException: Initialization of CAS Processor
with name "RelationExtractorCPETest" failed.  
>    at org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initialize(CollectionProcessingEngine_impl.java:83)
>    ...
> Caused by: org.apache.uima.resource.ResourceConfigurationException: Initialization of
CAS Processor with name "RelationExtractorCPETest" failed.  
>    at org.apache.uima.collection.impl.cpm.container.CPEFactory.produceIntegratedCasProcessor(CPEFactory.java:1104)
>    …
> Caused by: org.apache.uima.resource.ResourceInitializationException
>    at org.apache.ctakes.core.resource.LuceneIndexReaderResourceImpl.load(LuceneIndexReaderResourceImpl.java:80)
>    ...
> Caused by: java.io.FileNotFoundException: org/apache/ctakes/dictionary/lookup/rxnorm_index
>    at org.apache.ctakes.core.resource.FileLocator.locateExplicitly(FileLocator.java:69)
>    at org.apache.ctakes.core.resource.FileLocator.locateFile(FileLocator.java:44)
>    at org.apache.ctakes.core.resource.LuceneIndexReaderResourceImpl.load(LuceneIndexReaderResourceImpl.java:58)
>    ... 53 more
> I assume this is because the UMLS indexes aren't in SVN anymore. What's the proper way
to reference these now, and should DictionaryLookupAnnotatorUMLS.xml be updated appropriately?
> Thanks,
> Steve
View raw message