ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azad Dehghan <azad.dehg...@gmail.com>
Subject Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
Date Tue, 03 Nov 2015 15:54:25 GMT
> Who else plans to provide patches for it? Just to avoid duplicate work
> and to coordnate the efforts ...

I would like to help with the translating JAPE to RUTA.

> Is there a development dataset which was utilized for the initial
> development, and if yes, is it possible to contribute it too?

The data set is unfortunately not publicly available; i2b2
<https://www.i2b2.org/NLP/DataSets/Main.php> typically releases the data
sets 12 months after a given challenge; this is done on an individual basis
and involve a Data Use Agreement.

However, I will be able to conduct and coordinate the validation.

> My first step would be:
> - set up a maven project
> - set up a development pipeline in a test (with cTAKES components
> replacing the previous ANNIE preprocessing)

> But one item that we need to review is the 3rd party libs jars that
> were included to ensure compatibility.  I’ll be sure to take a look at
> that over the next few weeks.
> —Pei
@Pei - once ANNIE components are replaced there is should not be a need to
worry about the 3rd party libs.

Also, just a thought: we may want to create an independent component for
the Two Pass recognition (TwoPass.java) as this method have shown useful
for general NER on longitudinal data and surely useful independent of the
deid component.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message