ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azad Dehghan <azad.dehg...@gmail.com>
Subject Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
Date Mon, 18 Jan 2016 19:13:35 GMT
Peter,

Thanks for pushing things!

I would rather split the rules/NERs to get things moving quicker (as I am a
newbie to Ruta). I will be uploading another NER (Username) shortly. I will
look at your changes to follow suit.

Best,
Azad

On 18 January 2016 at 14:06, Peter Klügl <peter.kluegl@averbis.com> wrote:

> Hi,
>
> a new patch is attached.
>
> @Pei:
> are there suitable annotation types in the cTAKES type system? Some
> project in cTAKES uses something like OntologyMatch... I map it to
> IdentifiedAnnotation right now, but there are many empty features...
>
> @Azad:
> I changed the rules a bit, especially the capitalization like I use it
> in ruta normally. The wordlist are compiled to a trie by the maven
> plugin. I also added the two regexes for url and email. I extended the
> regex for the url. I also changed the evaluation order of some rules
> (with @). Feel free to add simple examples to examples.csv for the unit
> tests.
>
> Let me know if you need more information about the changes.
>
> Do you wanna have help with the other rule sets? Or should we split them
> up?
>
> Best,
>
> Peter
>
> Am 18.01.2016 um 11:04 schrieb Peter Klügl:
> > Hi,
> >
> > great. I will integrate them in the project and in the next patch.
> >
> > Best,
> >
> > Peter
> >
> > Am 18.01.2016 um 00:58 schrieb Azad Dehghan:
> >> Three NERs translated and uploaded.
> >>
> >> PS. I will validate all NERs once we have them all completed.
> >>
> >> Cheers,
> >> Azad
> >>
> >> On 24 November 2015 at 10:37, Azad Dehghan <azad.dehghan@gmail.com>
> wrote:
> >>
> >>> This is on my todo list for Dec. as well. If there are any more
> volunteers
> >>> for translating JAPE to RUTA, please get in touch.
> >>>
> >>> Cheers,
> >>> Azad
> >>>
> >>> On 24 Nov 2015 09:55, "Peter Klügl" <peter.kluegl@averbis.com> wrote:
> >>>> Hi,
> >>>>
> >>>> I just wanted to mention that I haven't forgot about it.
> Unfortunately,
> >>>> there is just no spare time right now. I hope I will be able to
> provide
> >>>> the patches in December.
> >>>>
> >>>> Best,
> >>>>
> >>>> Peter
> >>>>
> >>>> Am 06.11.2015 um 16:40 schrieb Pei Chen:
> >>>>> Hi Peter,
> >>>>> I think the ctakes-examples is probably a good starting point at
> least
> >>>>> in terms of maven modules, etc.  I think it would be good if we
use
> >>>>> uimaFIT style as primary approach to wiring components together
and
> >>>>> generate desc's as secondary...
> >>>>> I think the actual components that would be required is probably
best
> >>>>> left up to what is actually required for best performing c-deid.
 The
> >>>>> output would be interesting, I'm not sure if we should treat this
as
> >>>>> an independent preprocessing component or part of a pipeline (in
> which
> >>>>> case, we may need to propose a change to the type system or perhaps
> an
> >>>>> alternative JCas view.  You can probably open up that discussion
to
> >>>>> the dev group as you see fit.)
> >>>>>
> >>>>> My 2 cents...
> >>>>>
> >>>>>
> >>>>> On Fri, Nov 6, 2015 at 3:38 AM, Peter Klügl <
> peter.kluegl@averbis.com>
> >>> wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> Is there a cTAKES project that may serve as an example on how
the
> >>> cTAKES
> >>>>>> community develops or how a project should look like?
> >>>>>> I learned that different people set up UIMA project in a quite
> >>> different
> >>>>>> manner and I do not what to get inspired by "some sort of out-dated"
> >>>>>> approach in the cTAKES repo.
> >>>>>>
> >>>>>> Are there restriction or preferences about the preprocessing
> >>> components
> >>>>>> that should be used and the kind of "output" of the project.
> >>>>>> Components: On which components may the componetns rely: tokenizer,
> >>> ...
> >>>>>> parser, ... dict lookup?
> >>>>>> "output": Should the project provide a pipeline or a single
AE?
> >>>>>>
> >>>>>> More comments below.
> >>>>>>
> >>>>>> Am 03.11.2015 um 16:54 schrieb Azad Dehghan:
> >>>>>>>> Who else plans to provide patches for it? Just to avoid
duplicate
> >>> work
> >>>>>>>> and to coordnate the efforts ...
> >>>>>>>>
> >>>>>>> I would like to help with the translating JAPE to RUTA.
> >>>>>> You can already go ahead with the UIMA Ruta Workbench if you
want,
> or
> >>>>>> wait until I set up the project with ruta integration.
> >>>>>>
> >>>>>> If any questions arise, just ask :-)
> >>>>>>
> >>>>>>>> Is there a development dataset which was utilized for
the initial
> >>>>>>>> development, and if yes, is it possible to contribute
it too?
> >>>>>>>>
> >>>>>>> The data set is unfortunately not publicly available; i2b2
> >>>>>>> <https://www.i2b2.org/NLP/DataSets/Main.php> typically
releases
> the
> >>> data
> >>>>>>> sets 12 months after a given challenge; this is done on
an
> >>> individual basis
> >>>>>>> and involve a Data Use Agreement.
> >>>>>>>
> >>>>>>> However, I will be able to conduct and coordinate the validation.
> >>>>>>>
> >>>>>> Ok, I'll investigate if we have already access to the dataset
here.
> >>>>>>
> >>>>>>
> >>>>>>>> My first step would be:
> >>>>>>>> - set up a maven project
> >>>>>>>> - set up a development pipeline in a test (with cTAKES
components
> >>>>>>>> replacing the previous ANNIE preprocessing)
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> But one item that we need to review is the 3rd party
libs jars
> that
> >>>>>>>> were included to ensure compatibility.  I’ll be sure
to take a
> look
> >>> at
> >>>>>>>> that over the next few weeks.
> >>>>>>>>
> >>>>>>>> —Pei
> >>>>>>>>
> >>>>>>>>
> >>>>>>> @Pei - once ANNIE components are replaced there is should
not be a
> >>> need to
> >>>>>>> worry about the 3rd party libs.
> >>>>>>>
> >>>>>>> Also, just a thought: we may want to create an independent
> component
> >>> for
> >>>>>>> the Two Pass recognition (TwoPass.java) as this method have
shown
> >>> useful
> >>>>>>> for general NER on longitudinal data and surely useful independent
> >>> of the
> >>>>>>> deid component.
> >>>>>>>
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Azad
> >>>>>>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message