ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Finan, Sean" <Sean.Fi...@childrens.harvard.edu>
Subject RE: Question about the pipeline
Date Wed, 04 Feb 2015 00:43:30 GMT
Hi Tol,

> Essentially, I want to know how to set up the cTAKES objects correctly into a pipeline
in a Java programs, so that medical texts are annotated, like the GUI is doing. I would really
appreciate any hints or how to accomplish this.

Looking at your embedded code I think that you've got the general idea of how to do everything.
 Perhaps you are wondering how to create custom pipelines by programmatically adding chosen
processors?

Tim Miller made a great addition (imo) to the cTakes code with the org.apache.ctakes.clinicalpipeline.
ClinicalPipelineFactory class.  Perhaps you can take a look at that and see if it helps?

Sean

-----Original Message-----
From: Tol O. [mailto:toltox@gmail.com] 
Sent: Tuesday, February 03, 2015 7:35 PM
To: dev@ctakes.apache.org
Subject: Re: Question about the pipeline

>

Sean,

Thank you for the detailed reply.

As you mentioned, I had to revert the capital letters from your Outlook, and also, if somebody
else wants to use the code and cannot get it to run: the getFilesInDir method needs to return
the populated Collection<File> fileList, the variable final File[] fileList and its
usage should be renamed to something else (as the variable name already exists) and the main
method needs to throw an IOException.

I think these were all the changes I made so that the txt files from a folder are added to
the collection, many thanks again.

What I am looking to do is also what the description in "ExampleAggregatePipeline" says, "running
a pipeline programatically w/o uima xml descriptor xml files". This is accomplished by what
I understand the uimaFIT classes, so that AEs can be defined in Java, added to a Pipeline
and directly run.

The uimaFIT page gives a nice Java snippet that uses uimaFIT in a similar way as the cTAKES
example, I pasted the few Java lines below at [1]. 
https://urldefense.proofpoint.com/v2/url?u=http-3A__uima.apache.org_d_uimafit-2Dcurrent_tools.uimafit.book.html-23ugr.tools.uimafit.introduction&d=BQICAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uhPMXYD_U8cpnenfJCFigx00DCavTuwRGY-irX80FfU&s=4s5P35eByjHcLHM6WEp5jmjquPc-wynEgjBWnY6I6Pg&e=


I would like to use cTAKES in my own Java programs such that, just like the ExampleAggregatePipeline,
uimaFIT can be used create and run a cTAKES pipeline to annotate medical texts. Then, I could
also output the result in CAS files, just like the CVD GUI is doing. This would allow to directly
be able to add or modify my own AnalysisEngines.

Essentially, I want to know how to set up the cTAKES objects correctly into a pipeline in
a Java programs, so that medical texts are annotated, like the GUI is doing. I would really
appreciate any hints or how to accomplish this. 

Following your code example to read the files the outlined idea is:

for ( File file : files ) {
      Final String note = getTextInFile( file );
      JCas jCas = JCasFactory.createJCas();
      jCas.setDocumentText(note);

      // 1. create the AnalysisEngines for tokenizer, tagger and other cTAKES components etc.
to annotate medical texts
      // 2. runPipeline(jCas, ...);
}

[1]
The code snippet from uimaFIT:

JCas jCas = JCasFactory.createJCas();

jCas.setDocumentText("some text");

AnalysisEngine tokenizer = createEngine(MyTokenizer.class);

AnalysisEngine tagger = createEngine(MyTagger.class);

runPipeline(jCas, tokenizer, tagger);

for(Token token : iterate(jCas, Token.class)){
    System.out.println(token.getTag());
}

Tol O.


Finan, Sean <Sean.Finan@...> writes:

> 
> Hi Tol (and Maite),
> 
> I'm not entirely certain that I understand the question, but here is 
> an
attempt to help.  If I'm
> oversimplifying then I apologize.
> 
> I think that ExampleAggregatePipeline is intended to represent a very
simple single-note pipeline and
> that custom code could be produced by using it as an example.
> 
> If you want to process texts in a directory, you can find with a web
search plenty of ways to list files in a
> directory and read text from files. 
org.apache.ctakes.core.cr.FilesInDirectoryCollectionReader
> might be what you used in the CPE, and you can certainly peruse the 
> code
and take what you need.  Or, if you
> decide to write a simple diy,  here is one possibility:
> 
> Static public Collection<File> getFilesInDir( final File directory ) {
>    final Collection<File> fileList = new ArrayList<>();
>    final File[] fileList = directory.listFiles();
>    if ( fileList == null ) {
>       System.err.println( "please check the directory " +
directory.getAbsolutePath() );
>       System.exit( 1 );
>    }
>     for ( final File file : directory.listFiles() ) {
>         if ( file.canRead() ) {
>             fileList.add( file );
>         }
>     }
> }
> 
> Static public String getTextInFile( final File file ) throws 
> IOException {
  -- or handle ioE herein
>    final Path nioPath = file.toPath();
>    return new String( Files.readAllBytes( nioPath ) ); }
> 
> Static public void main( String ... args ) {
>    If ( args[0].isEmpty() ) {
>       System.out.println( "Enter a directory path" );
>       System.exit( 0 );
>    }
>    Final Collection<File> files = getFilesInDir( new File( args[0] );
>    For ( File file : files ) {
>       Final String note = getTextInFile( file );
>       ---  Insert here code a' la ExampleAggregatePipeline  ---
>       ---  swap out the writer in ExampleAggregatePipeline with 
> CasIOUtil
method (below)  ---
>    }
> }
> 
> I must admit that I have never directly used it, but there is an xmi 
> file
writing method in
> org.apache.uima.fit.util.CasIOUtil named writeXmi( JCas jCas, File file ).
 You could give this a try
> and see if it produces the type of output that you want.  The same 
> utility
class has a writeXCas(..) method.
> 
> If the above has absolutely nothing to do with your needs then please 
> send
me a bulleted list of items,
> example workflow, etc. and I'll see if I can be of service.
> 
> Oh, and I wrote the above code freehand, so MS Outlook is adding 
> capital
letters, etc.  If you cut and paste
> you'll need to change that - plus I haven't run/compiled, so there 
> might
be a typo or missed exception or
> something.  Or it may not work (in which case I'll throw in a little 
> more
effort).
> 
> Sean
> 
> -----Original Message-----
> From: Tol O. [mailto:toltox@...]
> Sent: Monday, February 02, 2015 6:56 PM
> To: dev@...
> Subject: Re: Question about the pipeline
> 
> Maite Meseure Hugues <meseure.maite <at> ...> writes:
> 
> > 
> > Hello all,
> > 
> > Thank you for your preceding answers.
> > I have a few questions regarding the pipeline example to run cTakes 
> > programmatically.
> > I am running ExampleAggregatePipeline.java with 
> > ExampleHelloWorldAnnotator but I would like to know how I can change 
> > it to run my data, as the CPE where we can choose the directory of our data.
> > My second question is about the xml output generated with the CPE, 
> > can I get the same xml output in using the example pipeline? and How?
> > Thanks for your time.
> 
> I would like to ask the same question. After successfully setting up
CTAKES following the Developers Guide
> I would also like to use a modified ExampleAggregatePipeline to output 
> a
CAS file identical to the output
> obtained by the CPE or the CVD when following the Users Guide.
> 
> This would be a great help for developers as a starting class to be 
> able
to programmatically obtain an
> annotated file based on a plaintext or XML input, same as through the 
> two
GUIs.
> 
> Right now I am reading through the Component Use Guide to replicate 
> the
CPE or the CVD tutorial with the test
> input, but it is a bit overwhelming.
> 
> Any pointers or suggestions would be really appreciated.
> 
> Tol O.
> 
> 





Mime
View raw message