ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miller, Timothy" <Timothy.Mil...@childrens.harvard.edu>
Subject Re: sentence detector model
Date Sat, 27 Sep 2014 17:14:03 GMT
I've just been using the opennlp command line cross validator on the small dataset i annotated
(along with some eyeballing). It would be cool if there was a standard clinical resource available
for this task, but I hadn't considered it much because the data I annotated pulls from multiple
datasets and the process of  arranging with different institutions to make something like
that available would probably be a nightmare.

Sent from my iPad. Sorry about the typos.

> On Sep 27, 2014, at 12:16 PM, "Dligach, Dmitriy" <Dmitriy.Dligach@childrens.harvard.edu>
> Tim, thanks for working on this!
> Question: do we have some formal way of evaluating the sentence detector? Maybe we should
come up with some dev set that would include examples from mimic...
> Dima
>> On Sep 27, 2014, at 8:57, Miller, Timothy <Timothy.Miller@childrens.harvard.edu>
>> I have been working on the sentence detector newline issue, training a model to probabilistically
split sentences on newlines rather than forcing sentence breaks. I have checked in a model
to the repo under ctakes-core-res. I also attached a patch to ctakes-core to the jira issue:
>> https://issues.apache.org/jira/browse/CTAKES-41
>> for people to test. The status of my testing is that it doesn't seem to break on
notes where ctakes worked well before (those where newlines are always sentence breaks), and
is a slight improvement on notes where newlines may or may not be sentence breaks. Once the
change is checked in we can continue improving the model by adding more data and features,
but the first hurdle I'd like to get past is making sure it runs well enough on the type of
data that the old model worked well on. Let me know if you have any questions.
>> Thanks
>> Tim

View raw message