ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miller, Timothy" <Timothy.Mil...@childrens.harvard.edu>
Subject sentence detector model
Date Sat, 27 Sep 2014 13:56:52 GMT
I have been working on the sentence detector newline issue, training a model to probabilistically
split sentences on newlines rather than forcing sentence breaks. I have checked in a model
to the repo under ctakes-core-res. I also attached a patch to ctakes-core to the jira issue:
https://issues.apache.org/jira/browse/CTAKES-41

for people to test. The status of my testing is that it doesn't seem to break on notes where
ctakes worked well before (those where newlines are always sentence breaks), and is a slight
improvement on notes where newlines may or may not be sentence breaks. Once the change is
checked in we can continue improving the model by adding more data and features, but the first
hurdle I'd like to get past is making sure it runs well enough on the type of data that the
old model worked well on. Let me know if you have any questions.

Thanks
Tim

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message