ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Masanz, James J." <Masanz.Ja...@mayo.edu>
Subject RE: cTakes chunking problem.
Date Fri, 31 Oct 2014 14:30:34 GMT
There was some domain specific data already used in creating the POS and chunking models

For info on the chunker, see
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+-+Chunker

Tokenization is rule-based within Apache cTAKES.
The default tokenizer is described here
http://ctakes.apache.org/apidocs/3.1.1/ctakes-core/org/apache/ctakes/core/nlp/tokenizer/TokenizerPTB.html


-- James

________________________________
From: Bala Krishnan [balkiprasanna1984@gmail.com]
Sent: Friday, October 31, 2014 2:25 AM
To: user@ctakes.apache.org
Subject: cTakes chunking problem.

Hi,

I have just have couple of clarifications. cTakes uses various NLP open source libraries for
sentence tokenization, pos tagging and chunking. Can anyone tell me what is the trained model
used for pos tagging, chunking ? Is it based on Genia corpus. I tried using genia tagger but
it is giving me different results from the cTakes. Can anyone suggest me some ideas on incorporating
domain specific corpora for tagging and chunking in cTakes ?

Regards,
Prasanna

Mime
View raw message