ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vijay garla <vnga...@gmail.com>
Subject Re: sentence detector model
Date Mon, 29 Sep 2014 17:23:52 GMT
Why not use the i2b2 corpora?

On Monday, September 29, 2014, Dligach, Dmitriy <
Dmitriy.Dligach@childrens.harvard.edu> wrote:

> Maybe creating a made-up set of sentences would be an option? That way we
> could agree on the annotation of concrete cases. Although this would be
> more of a unit test than a corpus.
>
> Dima
>
>
>
>
> On Sep 27, 2014, at 12:15, Miller, Timothy <
> Timothy.Miller@childrens.harvard.edu <javascript:;>> wrote:
>
> > I've just been using the opennlp command line cross validator on the
> small dataset i annotated (along with some eyeballing). It would be cool if
> there was a standard clinical resource available for this task, but I
> hadn't considered it much because the data I annotated pulls from multiple
> datasets and the process of  arranging with different institutions to make
> something like that available would probably be a nightmare.
> > Tim
> >
> > Sent from my iPad. Sorry about the typos.
> >
> >> On Sep 27, 2014, at 12:16 PM, "Dligach, Dmitriy" <
> Dmitriy.Dligach@childrens.harvard.edu <javascript:;>> wrote:
> >>
> >> Tim, thanks for working on this!
> >>
> >> Question: do we have some formal way of evaluating the sentence
> detector? Maybe we should come up with some dev set that would include
> examples from mimic...
> >>
> >> Dima
> >>
> >>
> >>
> >>
> >>> On Sep 27, 2014, at 8:57, Miller, Timothy <
> Timothy.Miller@childrens.harvard.edu <javascript:;>> wrote:
> >>>
> >>> I have been working on the sentence detector newline issue, training a
> model to probabilistically split sentences on newlines rather than forcing
> sentence breaks. I have checked in a model to the repo under
> ctakes-core-res. I also attached a patch to ctakes-core to the jira issue:
> >>> https://issues.apache.org/jira/browse/CTAKES-41
> >>>
> >>> for people to test. The status of my testing is that it doesn't seem
> to break on notes where ctakes worked well before (those where newlines are
> always sentence breaks), and is a slight improvement on notes where
> newlines may or may not be sentence breaks. Once the change is checked in
> we can continue improving the model by adding more data and features, but
> the first hurdle I'd like to get past is making sure it runs well enough on
> the type of data that the old model worked well on. Let me know if you have
> any questions.
> >>>
> >>> Thanks
> >>> Tim
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message