ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (3980)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: Paragraph Chunking in cTAKES
Date Wed, 23 Sep 2015 18:12:28 GMT
+1 really interested in the reply to this :)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





-----Original Message-----
From: Lewis John Mcgibbney <lewis.mcgibbney@gmail.com>
Reply-To: "user@ctakes.apache.org" <user@ctakes.apache.org>
Date: Wednesday, September 23, 2015 at 11:07 AM
To: "user@ctakes.apache.org" <user@ctakes.apache.org>
Subject: Paragraph Chunking in cTAKES

>Hi Folks,
>
>
>I am looking for some feedback on accuracy of cTAKES annotations over
>input text if the input text is not properly formed paragraphs?
>
>Is this known to significantly affect annotation accuracy/performance?
>
>Does anyone have a 'golden' input example of where cTAKES works best for
>annotation accuracy and performance?
>
>
>My situation is as follows; right now I use Apache Tika to parse a
>multitude of document and I feed the parse result from those documents
>into cTAKES for annotation purposes. Sometimes Tika is not able to form
>paragraphs correctly as the paragraphs are
> split over a page.
>
>
>
>Another example is when footer information (such as page numbers, DOI's,
>Journal names, etc.) exists between pages.
>
>
>Thanks for any feedback.
>
>Lewis
>
>
>
>-- 
>Lewis
>
>
>
>
>
>
>
>

Mime
View raw message