ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Dligach <dmitriy.dlig...@childrens.harvard.edu>
Subject Re: resources for training modules
Date Fri, 30 Aug 2013 13:46:08 GMT
Hi Will,

Retraining the relation extractor should be fairly easy. The 
instructions I am about to give you apply if you are using cTAKES 3.0. 
However, if you are planning to use the trunk version, my instructions 
may no longer be accurate. Relation extraction has undergone some 
changes recently in connection with cTAKES-190 issue and I don't fully 
understand these most recent changes yet (but I am working on it).

1. Run PreprocessAndWriteXmi in the eval package, specifying the 
location of the text of the notes, the location of the gold standard 
relation
annotations, and the output directory. This class will run all the 
preprocessing that is required for relation extraction and add gold standard
relation annotations to the CAS. The resulting CASes will be saved to 
disk as XMI files.

2. Run RelationExtractorEvaluation, passing it the location of the XMI 
files obtained in the previous steps and --grid-search option. This 
class will use the annotations in the XMI files to find the optimal 
training parameters using grid search and n-fold cross-validation. After 
the execution completes, record the best set of parameters found by the 
grid search. If you don't have a lot of time, this step can be skipped 
(you can just use the default SVM parameters).

3. Update the model parameters in the main() method of 
RelationExtractorTrain (pipelines package) to the values found by the 
grid search. Run RelationExtractorTrain, specifying the location of the 
XMI files. This class will (a) create a model that is necessary for 
deployment of the relation module, and (b) create the descriptor files 
which will ensure that the the relation AEs can be used as a part of a 
UIMA pipeline.

If you are planning to annotate your data, it might be easier to use 
Knowtator since we already have a gold standard reader for Knowtator. If 
you want to use a different annotation tool, you just have to make sure 
you add the manual annotations to the gold view of the XMI files. The 
relation extractor reads the gold standard annotations from the gold view.

Hope this helps,

Dima


On 08/29/2013 06:07 PM, William Karl Thompson wrote:
> Hello all,
>
> I'm interested in training the relation extractor on some annotated notes from Northwestern
clinical data, and I understand that cleartk is currently being used for this purpose in the
cTAKES project.  Could someone provide some pointers on how to go about using cleartk to train
models that can then be invoked by a cTAKES module? Again, my focus for now is on the relation
extractor. In case it's relevant, I'm intending to use the brat rapid annotation tool (http://brat.nlplab.org/)
to generate a gold standard corpus.
>
> Cheers,
>
> Will
>

-- 
Dmitriy Dligach, PhD
Research Fellow
Children's Hospital Informatics Program
Boston Children's Hospital and Harvard Medical School
(617) 919-3596


Mime
View raw message