ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Finan, Sean" <Sean.Fi...@childrens.harvard.edu>
Subject RE: v_snomed_fword_lookup view
Date Wed, 13 Aug 2014 18:22:31 GMT
You can find example Cas Consumers in cTakes-core ..[dirPath]../cc/

> -----Original Message-----
> From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
> Sent: Wednesday, August 13, 2014 2:20 PM
> To: dev@ctakes.apache.org
> Subject: Re: v_snomed_fword_lookup view
> 
> There's nothing conceptually special about the consumer model vs.
> regular annotators (Analysis Engines). You can write an output format from any
> analysis engine as long as it is after the annotations you need in the pipeline. If
> you have global constraints (like in an ARFF file I think you need to know all the
> CUIs in your corpus to write the attribute list?), then it is important to use the
> process() method [called once per document] to store CUIs in a non-UIMA class
> variable (for example, a map from file id to a list/set/multiset of CUIs), and then
> use the collectionProcessComplete() [called once after all documents have been
> processed] method to do the actual writing of the file.
> 
> Hope that is useful, sorry I couldn't tie it in to your previous YTEX exporter but
> I'm not familiar with that process.
> 
> Tim
> 
> 
> On 08/13/2014 02:11 PM, Clayton Turner wrote:
> > Oh okay, so is the purpose of a CasConsumer to essentially save your
> > data in a representation that you can do some kind of data mining or
> > classification on it?  If so, then I think I need to look into
> > making/using one of those.
> >
> >
> > On Wed, Aug 13, 2014 at 1:41 PM, Finan, Sean <
> > Sean.Finan@childrens.harvard.edu> wrote:
> >
> >> Hi Clayton,
> >>
> >> I'm glad that you got it working.  Though I stated that I would, I
> >> haven't yet checked the fidelity of trunk.  Urgent data request one
> >> day, "must have" writing the next ... and I still live with the
> >> delusion that I left academia to have free time ...
> >>
> >> I have never used ytex or weka, so I'm unfamiliar with all things .arff .
> >>  Could it be that the ytex .arff exporter needs to change consumed
> >> cTakes annotation classes (>3.1)?
> >>
> >> I have a custom CasConsumer that saves text spans and Cuis to file in
> >> a simple list, and that is what I used for the performance analysis
> >> of the lookup module.  For our other projects here in Beantown we
> >> have other various outputs that fit the job at hand: text flat files,
> >> xml files, sql database tables, knot-encoded lace doilies, etc.
> >>
> >> I'm sure that none of the above helps you, but I felt obliged to
> >> provide some kind of answer to your question.
> >>
> >> Sean
> >>
> >>> -----Original Message-----
> >>> From: clayclay911@gmail.com [mailto:clayclay911@gmail.com] On Behalf
> >>> Of Clayton Turner
> >>> Sent: Wednesday, August 13, 2014 12:25 PM
> >>> To: dev@ctakes.apache.org
> >>> Subject: Re: v_snomed_fword_lookup view
> >>>
> >>> Okay, I believe I have ctakes dictionary fast working now. Something
> >>> I'm
> >> curious
> >>> about, though, is how you extract the data in order to conduct analysis.
> >>>
> >>> I've, in the past, been using the SparseDataExporterImpl from ytex
> >>> in
> >> order to
> >>> create a .arff file for use in weka, but the ctakes pipeline I'm
> >>> using
> >> doesn't seem
> >>> to be compatible with this ytex exporting as I'm not getting any
> >>> cuis in
> >> my arff
> >>> file.
> >>>
> >>> I'm using the aggregate plain text umls processor analysis engine
> >>> from
> >> ctakes
> >>> and then using the dbconsumer analysis engine from ytex (for storing
> >> into the
> >>> database with regard to analysis batch).
> >>>
> >>> Any tips for exporting or some simple issue I'm missing?
> >>>
> >>> Thanks,
> >>> Clayton
> >>>
> >>>
> >>> On Mon, Aug 11, 2014 at 2:09 PM, Harpreet Khanduja <hsk5004@rit.edu>
> >>> wrote:
> >>>
> >>>> Yes, absolutely and
> >>>> no problem at all.
> >>>>
> >>>> Regards,
> >>>> Harpreet
> >>>>
> >>>>
> >>>> On Mon, Aug 11, 2014 at 1:16 PM, Finan, Sean <
> >>>> Sean.Finan@childrens.harvard.edu> wrote:
> >>>>
> >>>>> Thanks Harpreet,
> >>>>> That is definitely necessary to build!
> >>>>>
> >>>>> Those lines should already be in the pom, but commented out.  I
> >>>>> think
> >>>> that
> >>>>> some version/branching issues may have arisen at some point wrt
> >>>>> this
> >>>> module
> >>>>> ...
> >>>>>
> >>>>> If somebody beats me to it then cheers, otherwise I will try to
> >>>>> check out tonight and get all the bits in place.
> >>>>>
> >>>>> Sean
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Harpreet Khanduja [mailto:hsk5004@rit.edu]
> >>>>>> Sent: Monday, August 11, 2014 1:12 PM
> >>>>>> To: dev@ctakes.apache.org
> >>>>>> Subject: Re: v_snomed_fword_lookup view
> >>>>>>
> >>>>>> Hello Clayton,
> >>>>>>   I do not know about ytex, but I did switch from
> >>>>>> dictionary-lookup to
> >>>>> dictionary-
> >>>>>> lookup-fast.
> >>>>>>   I update my ctakes-dictionary-lookup-fast project using maven.
> >>>>>>   I think I used Team- Update and switched to the latest revision
> >>>>> available and
> >>>>>> then
> >>>>>>   I downloaded new 3.2 resources from the for umls. and then
I
> >>>>>> added
> >>>>> these
> >>>>>> resources to my
> >>>>>>   ctakes-dictionary-lookup-fast resources folder and also the
> >>>>>> classpath
> >>>>> in ctakes-
> >>>>>> clinical-pipeline.
> >>>>>>
> >>>>>>  Then I changed the pom.xml file which belongs to the whole
> >>>>>> ctakes
> >>>>> project and
> >>>>>> added <dependency> <groupId>org.apache.ctakes</groupId>
> >>>>>> <artifactId>ctakes-dictionary-lookup-res</artifactId>
> >>>>>> <version>${ctakes.version}</version>
> >>>>>> </dependency>
> >>>>>> <dependency>
> >>>>>> <groupId>org.apache.ctakes</groupId>
> >>>>>> <artifactId>ctakes-dictionary-lookup-fast</artifactId>
> >>>>>> <version>${ctakes.version}</version>
> >>>>>> </dependency>
> >>>>>>
> >>>>>>
> >>>>>>  these two dependencies to the file.
> >>>>>>
> >>>>>>
> >>>>>> After this, I also added the dependency
> >>>>>>     <dependency>
> >>>>>> <groupId>org.apache.ctakes</groupId>
> >>>>>> <artifactId>ctakes-dictionary-lookup-fast</artifactId>
> >>>>>> </dependency>
> >>>>>>
> >>>>>> to the pom.xml of ctakes-clinical-pipeline.
> >>>>>>
> >>>>>> And then add the resources folder in ctakes-clinical-pipeline
> >>>>>> using
> >>>>> build path
> >>>>>> configuration under "add class" option.
> >>>>>>
> >>>>>> After this it should work.
> >>>>>>
> >>>>>>
> >>>>>> Regards,
> >>>>>> Harpreet
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Aug 11, 2014 at 12:44 PM, Clayton Turner
> >>>>>> <caturner3@g.cofc.edu
> >>>>>> wrote:
> >>>>>>
> >>>>>>> I still get the same error with the ctakes3.2 branch. Any
> >>>> suggestions?
> >>>>>>>
> >>>>>>> On Mon, Aug 11, 2014 at 12:06 PM, Clayton Turner
> >>>>>>> <caturner3@g.cofc.edu>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> I'm going to do a clean install through the repo rather
than
> >>>>>>>> the binaries and see if that fixes my issue because
I think I
> >>>>>>>> just read a past post saying the lookup2 folders exist
there.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Mon, Aug 11, 2014 at 11:52 AM, Clayton Turner
> >>>>>>>> <caturner3@g.cofc.edu>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> When navigating to
> >>>>>>>>> ctakes-dictionary-lookup-fast\desc\analysis_engine
> >>>>>>>>> there are 2 files, assumedly analysis engines.
> >>>>>>>>>
> >>>>>>>>> SnomedLookupAnnotator.xml and SnomedOvLookupAnnotator.xml
> >>>>>>>>>
> >>>>>>>>> If I pick either, I put in my UMLS information but
receive an
> >>>> error
> >>>>>>>>> when trying to run the CPE:
> >>>>>>>>>
> >>>>>>>>> Initialization of CAS Processor with name
> >>>> "SnomedOvLookupAnnotator"
> >>>>>>>>> failed.
> >>>>>>>>> CausedBy:
> >>> org.apache.uima.resource.ResourceConfigurationException:
> >>>>>>>>> Initialization of CAS processor with name
> >>>> "SnomedOvLookupAnnotator"
> >>>>>>>>> failed.
> >>>>>>>>> CausedBy:
> >>>> org.apache.uima.resource.ResourceInitializationException:
> >>>>>>> Error
> >>>>>>>>> initializing "org.apache.uima.resource.impl.DataResource_impl"
> >>>> from
> >>>>>>>>> descriptor file:..............SnomedLookupAnnotator.xml
> >>>>>>>>> CausedBy:
> >>>> org.apache.uima.resource.ResourceInitializationException:
> >>>>>>> Could
> >>>>>>>>> not
> >>>>>>>>> access the resource data at
> >>>>>>>>>
> >>>>>>>>>
> >>>> file:org\apache\ctakes\dictionary\lookup2\Snomed2011ab_ctakesTui\cT
> >>>> ake
> >>>>>>> sSnomed.xml
> >>>>>>>>> Now, I don't even have a "lookup2" folder and, subsequently
> >>>>>>>>> the
> >>>> Tui
> >>>>>>>>> folder and cTakesSnomed.xml file. This seems to
be the
> >>>>>>>>> problem,
> >>>> but
> >>>>>>>>> I'm
> >>>>>>> not
> >>>>>>>>> sure where these files are supposed to be grabbed
from.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Mon, Aug 11, 2014 at 11:47 AM, Clayton Turner
> >>>>>>>>> <caturner3@g.cofc.edu>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi again:
> >>>>>>>>>>
> >>>>>>>>>> How exactly do you switch to using the cTakes
> >>>>> dictionary-lookup-fast.
> >>>>>>> Do
> >>>>>>>>>> I need to go in and alter xml files or is it
as simple as
> >>>>>>>>>> adding
> >>>> a
> >>>>>>> certain
> >>>>>>>>>> item to the list of analysis engines?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Aug 8, 2014 at 3:48 PM, Finan, Sean
<
> >>>>>>>>>> Sean.Finan@childrens.harvard.edu> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi Clayton,
> >>>>>>>>>>>
> >>>>>>>>>>> I don't know how the ytex dictionary lookup
works, so I'm
> >>>>>>>>>>> afraid that
> >>>>>>> I
> >>>>>>>>>>> can't help you with an answer.  Maybe Vijay
is the best
> >>>>>>>>>>> person
> >>>> to
> >>>>>>>>>>> do
> >>>>>>> this.
> >>>>>>>>>>>  If you aren't tied to ytex you could try
the new cTakes
> >>>>>>>>>>> dictionary-lookup-fast.  I tested "Patient
came in with a
> >>>>>>>>>>> malar
> >>>>> rash"
> >>>>>>> and
> >>>>>>>>>>> it found "malar" and "malar rash".
> >>>>>>>>>>>
> >>>>>>>>>>> Vijay,
> >>>>>>>>>>>
> >>>>>>>>>>> At some point the lookup-fast module will
be the default for
> >>>>>>>>>>> the
> >>>>>>> cTakes
> >>>>>>>>>>> clinical pipeline.  In order to synchronize
the ytex lookup
> >>>>>>>>>>> with
> >>>>>>> cTakes,
> >>>>>>>>>>> would you like to eventually work together
on reusing the
> >>>>>>>>>>> same code
> >>>>>>> for
> >>>>>>>>>>> ytex?  I have no idea what ytex does, but
I know the ins and
> >>>> outs
> >>>>>>>>>>> of
> >>>>>>> the
> >>>>>>>>>>> cdl-fast module.
> >>>>>>>>>>>
> >>>>>>>>>>> Sean
> >>>>>>>>>>>
> >>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>> From: clayclay911@gmail.com [mailto:clayclay911@gmail.com]
> >>>>>>>>>>>> On
> >>>>>>> Behalf
> >>>>>>>>>>> Of
> >>>>>>>>>>>> Clayton Turner
> >>>>>>>>>>>> Sent: Friday, August 08, 2014 2:08 PM
> >>>>>>>>>>>> To: dev@ctakes.apache.org
> >>>>>>>>>>>> Subject: v_snomed_fword_lookup view
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi Everyone:
> >>>>>>>>>>>>
> >>>>>>>>>>>> I have a question about how the v_snomed_fword_lookup
view
> >>>>>>>>>>>> works
> >>>>>>> when
> >>>>>>>>>>>> running the CPE.
> >>>>>>>>>>>>
> >>>>>>>>>>>> So my understanding of the view is that
it is a view
> >>>>>>>>>>>> comprised of
> >>>>>>> the
> >>>>>>>>>>>> ytex.umls_aui_fword table, the umls.mrconso
table and
> >>>>>>>>>>>> bits/pieces
> >>>>>>> from
> >>>>>>>>>>>> other umls tables.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I feel like this is not completely correct
or my idea of
> >>>>>>>>>>>> how the
> >>>>>>> join
> >>>>>>>>>>> to
> >>>>>>>>>>>> create the view works is off. For example,
let's say I want
> >>>> the
> >>>>>>>>>>>> CPE
> >>>>>>>>>>> to find
> >>>>>>>>>>>> "malar ____" (e.g. malar rash) as a
concept in the
> >>>> annotations.
> >>>>>>>>>>>> It
> >>>>>>>>>>> never
> >>>>>>>>>>>> happens after running my CPE descriptor
and I cannot find
> >>>>>>>>>>>> it
> >>>> in
> >>>>>>>>>>>> my v_snomed_fword_lookup view.
> >>>>>>>>>>>>
> >>>>>>>>>>>> select count(*) from umls_aui_fword
where fword='malar';
> >>>> yields
> >>>>>>>>>>>> 34
> >>>>>>>>>>> results
> >>>>>>>>>>>> select count(*) from umls.mrconso where
str='malar'; yields
> >>>>>>>>>>>> 3
> >>>>>>> results.
> >>>>>>>>>>>> So clearly these two tables know what
the cui and
> >>>>>>>>>>>> context(s) are for
> >>>>>>>>>>> malar
> >>>>>>>>>>>> ____. Yet, whenever I run a gold standard
set of notes
> >>>>>>>>>>>> through the
> >>>>>>>>>>> CPE,
> >>>>>>>>>>>> malar is constantly flagged as just
a word token and the
> >>>>>>>>>>>> concept is
> >>>>>>>>>>> never
> >>>>>>>>>>>> grabbed. This is recurrent for lots
of other concepts, as
> >>>> well,
> >>>>>>>>>>>> I
> >>>>>>> just
> >>>>>>>>>>>> wanted to use an example to illustrate
my issue.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Some troubleshooting I already went
through:
> >>>>>>>>>>>> 1) Reinstalled ytex and umls database
objects
> >>>>>>>>>>>> 2) Reinstalled a second time after redownloading
umls
> >>>>>>>>>>>> through metamorphosys, ensuring that
snomed vocabularies
> >>>>>>>>>>>> were included (also checked file sizes
and noticed a big
> >>>>>>>>>>>> difference so I
> >>>> know
> >>>>>>>>>>>> those vocabularies ARE included
> >>>>>>>>>>>>
> >>>>>>>>>>>> Anyone got any ideas as to what the
issue could be?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thank you,
> >>>>>>>>>>>> Clayton Turner
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> --
> >>>>>>>>>> Clayton Turner
> >>>>>>>>>> email: caturner3@g.cofc.edu
> >>>>>>>>>> phone: (843)-424-3784
> >>>>>>>>>> web: claytonturner.blogspot.com
> >>>>>>>>>>
> >>>>>>>>>>
> >>>> -------------------------------------------------------------------
> >>>> ---
> >>>>>>> ---------------------------
> >>>>>>>>>> "When scientifically investigating the natural
world, the
> >>>>>>>>>> only thing worse than a blind believer is a
seeing denier."
> >>>>>>>>>> - Neil deGrasse Tyson
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> --
> >>>>>>>>> Clayton Turner
> >>>>>>>>> email: caturner3@g.cofc.edu
> >>>>>>>>> phone: (843)-424-3784
> >>>>>>>>> web: claytonturner.blogspot.com
> >>>>>>>>>
> >>>>>>>>>
> >>>> -------------------------------------------------------------------
> >>>> ---
> >>>>>>> ---------------------------
> >>>>>>>>> "When scientifically investigating the natural world,
the only
> >>>>>>>>> thing worse than a blind believer is a seeing denier."
> >>>>>>>>> - Neil deGrasse Tyson
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> --
> >>>>>>>> Clayton Turner
> >>>>>>>> email: caturner3@g.cofc.edu
> >>>>>>>> phone: (843)-424-3784
> >>>>>>>> web: claytonturner.blogspot.com
> >>>>>>>>
> >>>>>>>>
> >>>> -------------------------------------------------------------------
> >>>> ---
> >>>>>>> ---------------------------
> >>>>>>>> "When scientifically investigating the natural world,
the only
> >>>> thing
> >>>>>>> worse
> >>>>>>>> than a blind believer is a seeing denier."
> >>>>>>>> - Neil deGrasse Tyson
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> --
> >>>>>>> Clayton Turner
> >>>>>>> email: caturner3@g.cofc.edu
> >>>>>>> phone: (843)-424-3784
> >>>>>>> web: claytonturner.blogspot.com
> >>>>>>>
> >>>>>>>
> >>>> -------------------------------------------------------------------
> >>>> ---
> >>>>>>> --------------------------- "When scientifically investigating
> >>>>>>> the natural world, the only thing worse than a blind believer
is
> >>>>>>> a seeing denier."
> >>>>>>> - Neil deGrasse Tyson
> >>>>>>>
> >>>
> >>>
> >>> --
> >>> --
> >>> Clayton Turner
> >>> email: caturner3@g.cofc.edu
> >>> phone: (843)-424-3784
> >>> web: claytonturner.blogspot.com
> >>>
> >> ---------------------------------------------------------------------
> >> ----------------------------
> >>> "When scientifically investigating the natural world, the only thing
> >> worse than a
> >>> blind believer is a seeing denier."
> >>> - Neil deGrasse Tyson
> >
> >


Mime
View raw message