ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Pei" <Pei.C...@childrens.harvard.edu>
Subject FW: UWM graduate student, need help on using CTakes
Date Wed, 10 Apr 2013 19:21:45 GMT
Hi Soheil,
[including dev@ctakes]
I think this (1) seems to be a pretty common use case.
One can configure the input directory, pipeline, output directory, run UIMA's Collection Processing
Engine via command line:
ctakes-clinical-pipeline/desc/collection_processing_engine/test_plaintext.xml as an example.

But I wonder if we could simplify/enhance a set of CLI tools...

--Pei



From: Soheil Moosavi [mailto:ssoheilmn@gmail.com]
Sent: Wednesday, April 10, 2013 3:07 PM
To: Chen, Pei
Cc: Savova, Guergana; Rashmi Prasad
Subject: Re: UWM graduate student, need help on using CTakes

Dear Pei,

Thank you very much for your step by step and descriptive comments. I have read both user
and developers guide of CTAKES from Apache website. As far as I understand, the users guide
let the user run the CTAKES GUI and use the interface to work with components. Developers
guide on the other hand teaches developers how to use the source code and add their own annotator.
It needs to get the source code and work with it.

I'll explain more how we usually use the library files in our projects. There are two usages
that I have in mind:
1. Here is UWM we install many tools on the server and let the students connect to the server
and use the tools. For example, students can follow these steps to use tokenizer, spliter,
POS tagger, lemmatizer and NER of stanford corenlp library:

     - =============================================
     - Connect to the unix server with your username and password
     - cd /data02/tools/StanfordCoreNLP/stanford-corenlp-2012-07-09
     - java -cp stanford-corenlp-2012-07-09.jar:stanford-corenlp-2012-07-06-models.jar:xom.jar:joda-time.jar
-Xmx3g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner
-file input.txt -outputExtension .stanfordnlp
     - annotators specifies which annotaotrs do you need to be generated. You can change it
if you don't need all of them.
     - file specifies the input text file which contains your sentences.
     - The output for your input file will be written in a file with the same name as input
file, plus .xml at the end of it's name. For exampl if the input file name is "input.txt",
the outpt will be written in "input.txt.xml". The output format is XML based.
     - =============================================

So users can easily call the jar file on the server, give the input and get the output to
use in as part of their program.

I am wondering if such a jar file or library is available with CTAKES which let the users
call it using command line?


2. The other way that I use the Stanford CoreNLP is to add the "stanford-corenlp-1.3.4.jar"
file to my java project and import the classes to my project. Then I can call the classes
and functions and get the output to use it in my program.
So, I am also wondering about possibility of doing the same thing with CTAKES and use it as
a library of java classes in my code.

These two kinds of using jar files or library files are very common. I would appreciate it
if let us know about possibility of using these methods in the projects.

I really appreciate your comments and advises.

Sincerely,


Soheil Moosavi
-----------------------------------------------------------------------------------

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message