ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pei Chen <chen...@apache.org>
Subject Re: Clinical Pipeline Solution
Date Mon, 21 Oct 2013 21:08:46 GMT
Hi Eli,
Based on the sample code, I presume you are using uimaFIT to wire up the
pipeline.
InputStreamCollectionReader probably came from cleartk-utils. Where it came
from is probably not so important though...
CollectionReaders were designed to read in a collection of documents/batch
processing (hence the examples have Files In a Directory example).
If you are really looking to have dynamic text in some sort of real-time or
SOA architecture, then you may want to take a look at creating the jCAS and
setting the text on it? uimaFIT also has a good example of this [2].
Something like:

If it's batch processing, you may find Tim's bagofcui's example [1] helpful
based on your example...

[1]
https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-clinical-pipeline/src/main/java/org/apache/ctakes/clinicalpipeline/runtime/BagOfCUIsGenerator.java
[2]
http://svn.apache.org/repos/asf/uima/sandbox/uimafit/trunk/uimafit-examples/src/main/java/org/apache/uima/fit/examples/tutorial/ex1/RoomNumberAnnotatorPipeline.java

Hope that helps...


On Mon, Oct 21, 2013 at 4:22 PM, eli mizzou <eli.mizzu@gmail.com> wrote:

> Hi cTAKES folks,
>
> I am trying to figure out how to run the Clinical Document Pipeline from
> Java.  I have a set of clinical documents as plain texts. I want to parse
> these documents and extract a list of <doc_ID, CUI, freq> that is in
> document *doc_ID*, there is *CUI* with frequency of *freq*. I spent
> several days installing cTAKES and looking for a solution. I narrow it down
> to ClinicalPipelineWithUmls.java where gets a test and runs SimplePipeline
> with a AnaylisisEngineDescription. Here is a part of the code:
>
> String documentText = "Text of document to test goes here, such as the
> following. No edema, some soreness, denies pain."; InputStream inStream =
> InputStreamCollectionReader.convertToByteArrayInputStream(documentText);
> CollectionReader collectionReader =
> InputStreamCollectionReader.getCollectionReader(inStream);
> AnalysisEngineDescription pipelineIncludingUmlsDictionaries =
> AnalysisEngineFactory.createAnalysisEngineDescription(
> "desc/analysis_engine/AggregatePlaintextUMLSProcessor");
> AnalysisEngineDescription xWriter =
> AnalysisEngineFactory.createPrimitiveDescription( XWriter.class,
> XWriter.PARAM_OUTPUT_DIRECTORY_NAME, AssertionConst.evalOutputDir,
> XWriter.PARAM_XML_SCHEME_NAME, XWriter.XMI,
> XWriter.PARAM_FILE_NAMER_CLASS_NAME, CtakesFileNamer.class.getName());
> SimplePipeline.runPipeline(collectionReader,
> pipelineIncludingUmlsDictionaries, xWriter); System.out.println("Done at "
> + new Date());
>
> The problem is it can not find "*InputStreamCollectionReader*". I
> searched for it but no success so far! Would you please give me a hint or
> show some directions?
>
> thanks for any help!
>
> -Eli
>

Mime
View raw message