ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Miller <timothy.mil...@childrens.harvard.edu>
Subject training data for sentence detector
Date Fri, 07 Feb 2014 22:24:23 GMT
We were discussing the sentence detector thing in person here the other 
day and Pei had a thought that depending on what sources you were using 
for training the sentence detector, we might be able to do something 
equivalent here in Boston by using SHARP, THYME, MIPACQ data which are 
largely from Mayo and probably similar to what you use, then augmenting 
with the little bit of MIMIC that I annotated. I don't know how that 
compares size-wise to the dataset that you are using. Is it quite large 
or do you think if we use derived data from those other projects will we 
be good? What do you think of this plan? Anyone else?

View raw message