incubator-ctakes-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pei Chen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CTAKES-96) Update Dependency Parser and Semantic Role Labeler - Thanks Jinho Choi and Lee Beecker
Date Fri, 02 Nov 2012 20:13:12 GMT
Pei Chen created CTAKES-96:
------------------------------

             Summary: Update Dependency Parser and Semantic Role Labeler - Thanks Jinho Choi
and Lee Beecker
                 Key: CTAKES-96
                 URL: https://issues.apache.org/jira/browse/CTAKES-96
             Project: cTAKES
          Issue Type: New Feature
            Reporter: Pei Chen
             Fix For: future enhancement


Update/create new wrappers for ClearNLP that have been trained on clinical notes (SHARP/MiPACQ).

Some notes:
the integration will be mostly switching to cTAKES types.

Here are a few critical spots:

In the tokenizer (https://code.google.com/p/cleartk/source/browse/cleartk-clearnlp/src/main/java/org/cleartk/clearnlp/Tokenizer.java),
lines 96 and 106 are all that should need changing to switch to cTAKES Sentence and Token
types.

In the pos-tagger (https://code.google.com/p/cleartk/source/browse/cleartk-clearnlp/src/main/java/org/cleartk/clearnlp/PosTagger.java)
most of the changes should be lines 109 and 116-118

In the MP Analyzer (https://code.google.com/p/cleartk/source/browse/cleartk-clearnlp/src/main/java/org/cleartk/clearnlp/MPAnalyzer.java)
the changes would be lines 122-124 to again use the cTAKES toke types.

The Dependency Parser (https://code.google.com/p/cleartk/source/browse/cleartk-clearnlp/src/main/java/org/cleartk/clearnlp/DependencyParser.java)
is a bit harder, but similar.  I think you can step through and find instances of ClearTK
types and swap them for the Dependency Relation types in cTAKES.  Basically the code grabs
the token, POS, and lemma data from the CAS and passes it onto Jinho's SRL.  Then the work
is in mapping that output back into CAS appropriate types.

The Semantic Role Labeler (https://code.google.com/p/cleartk/source/browse/cleartk-clearnlp/src/main/java/org/cleartk/clearnlp/SemanticRoleLabeler.java)
follows a similar flow.  But also pulls out Dependency Parse information from the CAS.  Then
the work is in extracting the SRL arguments and predicates to put back into ClearTK CAS types.

Lastly to get any idea of how these components are called in a UIMA pipeline, I would refer
to the test cases, especailly the ClearNLP test case (https://code.google.com/p/cleartk/source/browse/cleartk-clearnlp/src/test/java/org/cleartk/clearnlp/ClearNLPTest.java)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message