ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Paragraph Chunking in cTAKES
Date Wed, 23 Sep 2015 18:07:39 GMT
Hi Folks,

I am looking for some feedback on accuracy of cTAKES annotations over input
text if the input text is not properly formed paragraphs?
Is this known to significantly affect annotation accuracy/performance?
Does anyone have a 'golden' input example of where cTAKES works best for
annotation accuracy and performance?

My situation is as follows; right now I use Apache Tika to parse a
multitude of document and I feed the parse result from those documents into
cTAKES for annotation purposes. Sometimes Tika is not able to form
paragraphs correctly as the paragraphs are split over a page.

Another example is when footer information (such as page numbers, DOI's,
Journal names, etc.) exists between pages.

Thanks for any feedback.
Lewis

-- 
*Lewis*

Mime
View raw message