ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Pei" <Pei.C...@childrens.harvard.edu>
Subject Re: adding the relation extractor aggregate to the regression test
Date Wed, 20 Mar 2013 14:39:07 GMT
Perhaps we can start without the umls resources first. (I was not very comfortable in the last
release because of the limited test coverage. Hence, I hope this will be a starting point)

I think it is def possible to auto download umls, unpack, tester provide credentials, run
the tests. None of this gets distributed so I think it's possible.  I could take a stab at
this in a few weeks unless someone gives it a shot first.  

Sent from my iPhone

On Mar 20, 2013, at 10:27 AM, "Masanz, James J." <Masanz.James@mayo.edu> wrote:

> Perhaps you or Pei can weigh in on the feasibility of having the UMLS resource downloaded
from somewhere else for these regression tests that get run automatically.
> I was guessing we would have a separate set of tests that would be run when a RC was
built that would test the pipelines that include the UMLS resources.  But that the regression
tests in ctakes-regression-test would at least ensure the other (non UMLS) parts of the pipelines
> -- James
>> -----Original Message-----
>> From: ctakes-dev-return-1387-Masanz.James=mayo.edu@incubator.apache.org
>> [mailto:ctakes-dev-return-1387-Masanz.James=mayo.edu@incubator.apache.org]
>> On Behalf Of Steven Bethard
>> Sent: Wednesday, March 20, 2013 7:37 AM
>> To: ctakes-dev@incubator.apache.org
>> Subject: Re: adding the relation extractor aggregate to the regression
>> test
>> On Mar 19, 2013, at 10:59 AM, "Masanz, James J." <Masanz.James@mayo.edu>
>> wrote:
>>> For #2, the other consideration is that unlike the models, the UMLS
>>> resources are not available under the ALv2, so they are not going to
>>> end up in the ASF repo even if we decide on option 2 for the
>>> "[DISCUSS] Where should cTAKES models live?" thread
>>> Because the UMLS resources will not end up in the ASF repo, for these
>>> regression test CPEs, instead of using
>>> DictionaryLookupAnnotatorUMLS.xml I think we are going to have to use
>>> DictionaryLookupAnnotator.xml
>>> Here are few of the terms I remember offhand that are in the 'toy'
>>> dictionary used by
>>> ctakes-dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotato
>>> r.xml
>> I guess my concern here is that the whole point of running a regression
>> test is to make sure that the descriptor that we ship actually works. If
>> we're not testing the real descriptor we expect people to use, then
>> perhaps we shouldn't include that real descriptor at all?
>> Or did you envision some other process that would also test the real
>> descriptor?
>> Steve
>>> knee
>>> pain
>>> aspirin
>>> -- James
>>>> -----Original Message-----
>>>> From:
>>>> ctakes-dev-return-1381-Masanz.James=mayo.edu@incubator.apache.org
>>>> [mailto:ctakes-dev-return-1381-Masanz.James=mayo.edu@incubator.apache
>>>> .org]
>>>> On Behalf Of Chen, Pei
>>>> Sent: Saturday, March 16, 2013 8:52 AM
>>>> To: <ctakes-dev@incubator.apache.org>
>>>> Cc: ctakes-dev@incubator.apache.org
>>>> Subject: Re: adding the relation extractor aggregate to the
>>>> regression test
>>>> The intended behavior of the regression test is to verify that new
>>>> code didn't break existing functionality, so yes the xml output
>>>> should be the same from previous runs. If there are expected changes,
>>>> they should just be manually verified and rerecorded.  This should
>>>> supplement any unit tests but not replace it. It's a 20000ft test
>>>> that a pipeline still works as expected and not really intended to
>> replace specific logically tests.
>>>> It's a starting point- we can certainly add more or improve it.  Both
>>>> in terms of adding more unit tests as well as regression.
>>>> 2). Yes.  We'll need to add UMLS resources if they are to be tested.
>>>> Open to ideas and volunteers as I didn't get to that point yet :)
>>>> Sent from my iPhone
>>>> On Mar 16, 2013, at 8:48 AM, "Steven Bethard"
>>>> <steven.bethard@Colorado.EDU> wrote:
>>>>> On Mar 15, 2013, at 9:26 PM, "Pei Chen (JIRA)" <jira@apache.org>
>> wrote:
>>>>>> If you have spare time, do you want to also try adding the relation
>>>> extractor aggregate to the regression test?  And having this
>>>> (pipeline as well as the xml desc configuration) automatically tested
>> in the future?
>>>>>> It should be as simple as adding a CPE to the directory.
>>>>>> /ctakes-regression-test/desc/collection_processing_engine/
>>>>>> Take a look at
>>>> http://svn.apache.org/repos/asf/incubator/ctakes/trunk/ctakes-regress
>>>> ion- test/desc/collection_processing_engine/CoreferenceCPETest.xml
>>>>>> For example:
>>>>>> 1)    Just clone and point to the CPE to ../../../ctakes-relation-
>>>> extractor/desc/analysis_engine/RelationExtractorAggregate.xml instead .
>>>>>> 2)    Run mvn test once (it should probably fail because there is
>>>> nothing to compare with, but just collect the generated results).
>>>>>> 3)    Copy the results from generatedoutput/{NameofCPEFilename}/
>>>> expectedoutput/{NameofCPEFilename}
>>>>>> 4)    Check the expectedoutput into SVN.
>>>>>> 5)    Now Every time mvn test is run, that CPE will executed and
>>>> results compared automatically.
>>>>> First, a general comment about the regression test, and then some
>>>> details about where I'm currently stuck.
>>>>> (1) Is it really a good idea to be asserting that the XML files
>>>> generated by cTAKES components should always be identical?
>>>> Particularly if the current components make some mistakes, shouldn't
>>>> we only be asserting the things that they get right? Something more
>>>> along the lines of
>>>> org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotatorsTes
>>>> t, where we have individual assertions for each thing the relation
>>>> extractor should have found?
>>>>> (2) In trying to add the CPETest, I got stuck trying to get ctakes-
>>>> dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotatorUMLS.
>>>> xml to work. (This descriptor is referenced by ctakes-relation-
>>>> extractor/desc/analysis_engine/RelationExtractorPreprocessor.xml.)
>>>> Here's the error I'm getting:
>>>>> org.apache.uima.resource.ResourceInitializationException:
>>>>> Initialization
>>>> of CAS Processor with name "RelationExtractorCPETest" failed.
>>>>>  at
>>>> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initi
>>>> alize
>>>> (CollectionProcessingEngine_impl.java:83)
>>>>>  ...
>>>>> Caused by: org.apache.uima.resource.ResourceConfigurationException:
>>>> Initialization of CAS Processor with name "RelationExtractorCPETest"
>>>> failed.
>>>>>  at
>>>> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInteg
>>>> rated
>>>> CasProcessor(CPEFactory.java:1104)
>>>>>  ...
>>>>> Caused by: org.apache.uima.resource.ResourceInitializationException
>>>>>  at
>>>> org.apache.ctakes.core.resource.LuceneIndexReaderResourceImpl.load(Lu
>>>> ceneI
>>>> ndexReaderResourceImpl.java:80)
>>>>>  ...
>>>>> Caused by: java.io.FileNotFoundException:
>>>> org/apache/ctakes/dictionary/lookup/rxnorm_index
>>>>>  at
>>>> org.apache.ctakes.core.resource.FileLocator.locateExplicitly(FileLoca
>>>> tor.j
>>>> ava:69)
>>>>>  at
>>>> org.apache.ctakes.core.resource.FileLocator.locateFile(FileLocator.ja
>>>> va:44
>>>> )
>>>>>  at
>>>> org.apache.ctakes.core.resource.LuceneIndexReaderResourceImpl.load(Lu
>>>> ceneI
>>>> ndexReaderResourceImpl.java:58)
>>>>>  ... 53 more
>>>>> I assume this is because the UMLS indexes aren't in SVN anymore.
>>>>> What's
>>>> the proper way to reference these now, and should
>>>> DictionaryLookupAnnotatorUMLS.xml be updated appropriately?
>>>>> Thanks,
>>>>> Steve

View raw message