ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pei Chen <chen...@apache.org>
Subject Re: UmlsConcept subject
Date Thu, 30 Jul 2015 21:07:59 GMT
Tomasz,
IIRC, the code in SubjectCleartkAnalysisEngine.java should have the
feature extractors used- I believe there is an ENUM of a preset of
features, but do not recall exactly which one was the best performing
for test set- probably best to check the source code.

I think adding the plain sentences examples in Jira would be a great
help since we can use that for unit testing at a minimum.
Currently, there is no real easy way to 'Append' training data, so one
has create the new set with examples in it.  The code used for
training is also in the project- it should be in the **/eval/* name
spaces.  I believe the gold standard was created in xml (either
knowtator or anafora).

Hope that helps.
--Pei

On Thu, Jul 23, 2015 at 10:33 AM, Tomasz Oliwa <oliwa@uchicago.edu> wrote:
> What format (features, labels) is best suitable for some more training examples?
>
> The SubjectCleartkAnalysisEngine class loads a /org/apache/ctakes/assertion/models/subject/model.jar,
which contains a liblinear cleartk model.
>
> The model has 3 features, label 12 3.
>
> But what are the features exactly are how are they derived?
>
> How does the target class look like, is is really differentiating between "patient",
"brother", "sister" etc. or is it a binary decision model between "patient" and "family_history"
(the latter is what is looks to me) ?
>
> This is not documented.
>
> Tomasz

Mime
View raw message