ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Bethard <stevenbeth...@apache.org>
Subject Re: resources for training modules
Date Fri, 30 Aug 2013 14:36:43 GMT
On Fri, Aug 30, 2013 at 8:46 AM, Dmitriy Dligach
<dmitriy.dligach@childrens.harvard.edu> wrote:
> Retraining the relation extractor should be fairly easy. The instructions I
> am about to give you apply if you are using cTAKES 3.0. However, if you are
> planning to use the trunk version, my instructions may no longer be
> accurate. Relation extraction has undergone some changes recently in
> connection with cTAKES-190 issue and I don't fully understand these most
> recent changes yet (but I am working on it).

With the trunk version, there's no need to run PreprocessAndWriteXmi.
Just run RelationExtractorEvaluation or RelationExtractorTrain
directly. (The XMIs will be automatically written to target/xmi.) I
believe the only required argument is --batches-dir, which gives the
directory containing the directories containing Knowtator_XML
directories. The other (optional) arguments should be similar to what
Dima described (and you can see what they are by looking at the static
Options classes (and their superclasses) in
RelationExtractorEvaluation and RelationExtractorTrain).

> If you are planning to annotate your data, it might be easier to use
> Knowtator since we already have a gold standard reader for Knowtator. If you
> want to use a different annotation tool, you just have to make sure you add
> the manual annotations to the gold view of the XMI files.

In the trunk version, most of the SHARP-specific stuff is handled by
the SHARPXMI class. So if you need to customize things away from what
was done for SHARP, that's probably where you'll need to go. At the
moment, RelationExtractorTrain and RelationExtractorEvaluation call
static methods on SHARPXMI, which means that it's not very extensible.
We could conceivably change these methods to non-static methods, and
then extensions of relation-extractor could provide their own
implementation. We're certainly open to modifying the infrastructure
here, so if you have any suggestions please do pass them on.


View raw message