ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <peter.klu...@averbis.com>
Subject Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
Date Thu, 10 Mar 2016 19:57:13 GMT
Hi,

Am 10.03.2016 um 20:29 schrieb Azad Dehghan:
> Thanks Peter,
>
> The rules were modeled using the training data.

This means both training data folders? I have access to the data but not 
to the challenge description.

> It would be good to incorporate/refactor (basically, GATE API needs to be
> replaced with UIMA API to generate annotation) the two-pass recognition
> method for cTAKES - which has a wider application on longitudinal data.
> This method is used on-top of a number NERs.

I'll take a look.

I do not know how much time I can invest this month. Let's see how many 
phases I can translate.

I added the rules for age. Are there jape rules for creating date 
annotations?

After all rules are translated, they need some major refactoring. Jape 
and Ruta are quite different in some aspects.

Best,

Peter




> Please let me know where I can help. I will be available again in April.
>
> Cheers,
> Azad
>
> On 10 March 2016 at 13:13, Peter Klügl <peter.kluegl@averbis.com> wrote:
>
>> Hi,
>>
>> sorry, I was quite busy last month.
>>
>> I added a new patch, which needs to be applied.
>>
>> No new rules, but it's possible now to evaluate everything against the
>> labelled data of the challenge.
>>
>> @Azad:
>> Which documents exactly did you use to develop the rules?
>> training-PHI-Gold-Set1, training-PHI-Gold-Set2 or testing-PHI-Gold-fixed?
>>
>> Best,
>>
>> Peter
>>
>> Am 03.02.2016 um 09:05 schrieb Peter Klügl:
>>> Hi,
>>>
>>> the last patch fixed almost all problems.
>>>
>>> I added another one that adds the csv file for the unit test and extends
>>> svn-ignore.
>>>
>>> Best,
>>>
>>> Peter
>>>
>>> Am 02.02.2016 um 09:16 schrieb Peter Klügl:
>>>> Hi,
>>>>
>>>> I added another patch. I missed to manually add one test file to version
>>>> control, and there are still duplicate lines.
>>>> I hope this patch fixes the remaining problems.
>>>>
>>>> Best,
>>>>
>>>> Peter
>>>>
>>>>
>>>> Am 29.01.2016 um 10:34 schrieb Peter Klügl:
>>>>> Hi,
>>>>>
>>>>> the problems were caused by the svn client in my Eclipse. Sorry for the
>>>>> trouble, I should have looked more closely at the ciomplete patch.
>>>>>
>>>>> I attached a new patch created with commandline tools wich looks
>> correct
>>>>> now.
>>>>>
>>>>> Pei, can you apply the new patch?
>>>>>
>>>>> Best,
>>>>>
>>>>> Peter
>>>>>
>>>>> Am 28.01.2016 um 15:57 schrieb Peter Klügl:
>>>>>> Thanks Pei.
>>>>>>
>>>>>> I fear there was again a problem with the patch. All new files are
>>>>>> missing (and also the svn-ignore settings).
>>>>>>
>>>>>> Can you take a look?
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>> Am 28.01.2016 um 14:43 schrieb Pei Chen:
>>>>>>> patch applied.
>>>>>>> Thanks,
>>>>>>> Pei
>>>>>>>
>>>>>>> On Thu, Jan 28, 2016 at 4:14 AM, Peter Klügl <
>> peter.kluegl@averbis.com> wrote:
>>>>>>>> Hi Pei,
>>>>>>>>
>>>>>>>> can you commit the recent patch for us?
>>>>>>>>
>>>>>>>> CTAKES-384-20160120.patch
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Peter
>>>>>>>>
>>>>>>>> Am 20.01.2016 um 19:35 schrieb Pei Chen:
>>>>>>>>> Hi,
>>>>>>>>> Sorry I was swamped recently.
>>>>>>>>> But yeah, we can even create an extended type system
to store
>> these items temporarily and add them into the main/core type system
>> afterwards.
>>>>>>>>> There was an existing item to upgrade UIMA, but agreed-
it will
>> require much more testing.  If it works, we can upgrade it in our sandbox
>> area or create a branch if necessary.
>>>>>>>>> —Pei
>>>>>>>>>
>>>>>>>>>> On Jan 18, 2016, at 9:06 AM, Peter Klügl <
>> peter.kluegl@averbis.com> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> a new patch is attached.
>>>>>>>>>>
>>>>>>>>>> @Pei:
>>>>>>>>>> are there suitable annotation types in the cTAKES
type system?
>> Some
>>>>>>>>>> project in cTAKES uses something like OntologyMatch...
I map it to
>>>>>>>>>> IdentifiedAnnotation right now, but there are many
empty
>> features...
>>>>>>>>>> @Azad:
>>>>>>>>>> I changed the rules a bit, especially the capitalization
like I
>> use it
>>>>>>>>>> in ruta normally. The wordlist are compiled to a
trie by the maven
>>>>>>>>>> plugin. I also added the two regexes for url and
email. I
>> extended the
>>>>>>>>>> regex for the url. I also changed the evaluation
order of some
>> rules
>>>>>>>>>> (with @). Feel free to add simple examples to examples.csv
for
>> the unit
>>>>>>>>>> tests.
>>>>>>>>>>
>>>>>>>>>> Let me know if you need more information about the
changes.
>>>>>>>>>>
>>>>>>>>>> Do you wanna have help with the other rule sets?
Or should we
>> split them up?
>>>>>>>>>> Best,
>>>>>>>>>>
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>> Am 18.01.2016 um 11:04 schrieb Peter Klügl:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> great. I will integrate them in the project and
in the next
>> patch.
>>>>>>>>>>> Best,
>>>>>>>>>>>
>>>>>>>>>>> Peter
>>>>>>>>>>>
>>>>>>>>>>> Am 18.01.2016 um 00:58 schrieb Azad Dehghan:
>>>>>>>>>>>> Three NERs translated and uploaded.
>>>>>>>>>>>>
>>>>>>>>>>>> PS. I will validate all NERs once we have
them all completed.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Azad
>>>>>>>>>>>>
>>>>>>>>>>>> On 24 November 2015 at 10:37, Azad Dehghan
<
>> azad.dehghan@gmail.com> wrote:
>>>>>>>>>>>>> This is on my todo list for Dec. as well.
If there are any
>> more volunteers
>>>>>>>>>>>>> for translating JAPE to RUTA, please
get in touch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Azad
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 24 Nov 2015 09:55, "Peter Klügl"
<peter.kluegl@averbis.com>
>> wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I just wanted to mention that I haven't
forgot about it.
>> Unfortunately,
>>>>>>>>>>>>>> there is just no spare time right
now. I hope I will be able
>> to provide
>>>>>>>>>>>>>> the patches in December.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 06.11.2015 um 16:40 schrieb Pei
Chen:
>>>>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>>>> I think the ctakes-examples is
probably a good starting
>> point at least
>>>>>>>>>>>>>>> in terms of maven modules, etc.
 I think it would be good if
>> we use
>>>>>>>>>>>>>>> uimaFIT style as primary approach
to wiring components
>> together and
>>>>>>>>>>>>>>> generate desc's as secondary...
>>>>>>>>>>>>>>> I think the actual components
that would be required is
>> probably best
>>>>>>>>>>>>>>> left up to what is actually required
for best performing
>> c-deid.  The
>>>>>>>>>>>>>>> output would be interesting,
I'm not sure if we should treat
>> this as
>>>>>>>>>>>>>>> an independent preprocessing
component or part of a pipeline
>> (in which
>>>>>>>>>>>>>>> case, we may need to propose
a change to the type system or
>> perhaps an
>>>>>>>>>>>>>>> alternative JCas view.  You can
probably open up that
>> discussion to
>>>>>>>>>>>>>>> the dev group as you see fit.)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My 2 cents...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Nov 6, 2015 at 3:38 AM,
Peter Klügl <
>> peter.kluegl@averbis.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Is there a cTAKES project
that may serve as an example on
>> how the
>>>>>>>>>>>>> cTAKES
>>>>>>>>>>>>>>>> community develops or how
a project should look like?
>>>>>>>>>>>>>>>> I learned that different
people set up UIMA project in a
>> quite
>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>> manner and I do not what
to get inspired by "some sort of
>> out-dated"
>>>>>>>>>>>>>>>> approach in the cTAKES repo.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Are there restriction or
preferences about the preprocessing
>>>>>>>>>>>>> components
>>>>>>>>>>>>>>>> that should be used and the
kind of "output" of the project.
>>>>>>>>>>>>>>>> Components: On which components
may the componetns rely:
>> tokenizer,
>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>> parser, ... dict lookup?
>>>>>>>>>>>>>>>> "output": Should the project
provide a pipeline or a single
>> AE?
>>>>>>>>>>>>>>>> More comments below.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Am 03.11.2015 um 16:54 schrieb
Azad Dehghan:
>>>>>>>>>>>>>>>>>> Who else plans to
provide patches for it? Just to avoid
>> duplicate
>>>>>>>>>>>>> work
>>>>>>>>>>>>>>>>>> and to coordnate
the efforts ...
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I would like to help
with the translating JAPE to RUTA.
>>>>>>>>>>>>>>>> You can already go ahead
with the UIMA Ruta Workbench if
>> you want, or
>>>>>>>>>>>>>>>> wait until I set up the project
with ruta integration.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If any questions arise, just
ask :-)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Is there a development
dataset which was utilized for the
>> initial
>>>>>>>>>>>>>>>>>> development, and
if yes, is it possible to contribute it
>> too?
>>>>>>>>>>>>>>>>> The data set is unfortunately
not publicly available; i2b2
>>>>>>>>>>>>>>>>> <https://www.i2b2.org/NLP/DataSets/Main.php>
typically
>> releases the
>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>> sets 12 months after
a given challenge; this is done on an
>>>>>>>>>>>>> individual basis
>>>>>>>>>>>>>>>>> and involve a Data Use
Agreement.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> However, I will be able
to conduct and coordinate the
>> validation.
>>>>>>>>>>>>>>>> Ok, I'll investigate if we
have already access to the
>> dataset here.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> My first step would
be:
>>>>>>>>>>>>>>>>>> - set up a maven
project
>>>>>>>>>>>>>>>>>> - set up a development
pipeline in a test (with cTAKES
>> components
>>>>>>>>>>>>>>>>>> replacing the previous
ANNIE preprocessing)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> But one item that
we need to review is the 3rd party libs
>> jars that
>>>>>>>>>>>>>>>>>> were included to
ensure compatibility.  I’ll be sure to
>> take a look
>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>> that over the next
few weeks.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> —Pei
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> @Pei - once ANNIE components
are replaced there is should
>> not be a
>>>>>>>>>>>>> need to
>>>>>>>>>>>>>>>>> worry about the 3rd party
libs.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Also, just a thought:
we may want to create an independent
>> component
>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> the Two Pass recognition
(TwoPass.java) as this method
>> have shown
>>>>>>>>>>>>> useful
>>>>>>>>>>>>>>>>> for general NER on longitudinal
data and surely useful
>> independent
>>>>>>>>>>>>> of the
>>>>>>>>>>>>>>>>> deid component.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>> Azad
>>>>>>>>>>>>>>>>>
>>


Mime
View raw message