ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azad Dehghan <azad.dehg...@gmail.com>
Subject Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
Date Sun, 17 Jul 2016 09:41:47 GMT
Hi Peter,

I will pick this up soon after the summer I hope.


Cheers,
Azad



2016-06-07 8:57 GMT+01:00 Peter Klügl <peter.kluegl@averbis.com>:

> Hi Azad,
>
>
> the basic rules are now translated. Do you wanna take a look at it?
>
>
> There remain still many issues and the F score is quite low on the dev
> set. I will continue improving the rules when I find the time.
>
>
> Best,
>
> Peter
>
>
> Am 15.03.2016 um 09:49 schrieb Peter Klügl:
> > Hi,
> >
> > this is essentially just a decision of design. For a single longitudinal
> > record, there is no problem at all. We can solve this even with some
> > simple ruta rules, or with some cutom analysis engine. If we want to
> > process a set of record of the same patient jointly, then we cannot
> > apply a single pipeline. I propose to postpone the decison and implement
> > it only for single documents for now.
> >
> > Best,
> >
> > Peter
> >
> >
> > Am 11.03.2016 um 20:03 schrieb Azad Dehghan:
> >>> I had a quick look on PassTwo. This is not directly translatable into
> >>> UIMA if the functionlity is based on analysis engines. Normally,
> >>> analysis engines process one document at a time in a pipeline. My first
> >>> quick guess is the we either need two pipelines (result is a program
> not
> >>> a component) or we need a different definition of a CAS (joining all
> >>> documents of a patient). Overall, it depends on the targeted use case
> of
> >>> the project. Should it be usable in a cTAKES/uimaFIT pipeline?
> >>>
> >> The two pass method will have a broader applicability for NER on
> >> longitudinal records...
> >>
> >>
> >>> btw, the CRF models are not part of the contribution, right?
> >>>
> >>>
> >> The CRF  (UK,US) models will be released but this will be together with
> a
> >> more mature software planned for August 2016.
> >>
> >> Best,
> >>> Peter
> >>>
> >>> Am 10.03.2016 um 20:29 schrieb Azad Dehghan:
> >>>> Thanks Peter,
> >>>>
> >>>> The rules were modeled using the training data.
> >>>>
> >>>> It would be good to incorporate/refactor (basically, GATE API needs
> to be
> >>>> replaced with UIMA API to generate annotation) the two-pass
> recognition
> >>>> method for cTAKES - which has a wider application on longitudinal
> data.
> >>>> This method is used on-top of a number NERs.
> >>>>
> >>>> Please let me know where I can help. I will be available again in
> April.
> >>>>
> >>>> Cheers,
> >>>> Azad
> >>>>
> >>>> On 10 March 2016 at 13:13, Peter Klügl <peter.kluegl@averbis.com>
> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> sorry, I was quite busy last month.
> >>>>>
> >>>>> I added a new patch, which needs to be applied.
> >>>>>
> >>>>> No new rules, but it's possible now to evaluate everything against
> the
> >>>>> labelled data of the challenge.
> >>>>>
> >>>>> @Azad:
> >>>>> Which documents exactly did you use to develop the rules?
> >>>>> training-PHI-Gold-Set1, training-PHI-Gold-Set2 or
> >>> testing-PHI-Gold-fixed?
> >>>>> Best,
> >>>>>
> >>>>> Peter
> >>>>>
> >>>>> Am 03.02.2016 um 09:05 schrieb Peter Klügl:
> >>>>>> Hi,
> >>>>>>
> >>>>>> the last patch fixed almost all problems.
> >>>>>>
> >>>>>> I added another one that adds the csv file for the unit test
and
> >>> extends
> >>>>>> svn-ignore.
> >>>>>>
> >>>>>> Best,
> >>>>>>
> >>>>>> Peter
> >>>>>>
> >>>>>> Am 02.02.2016 um 09:16 schrieb Peter Klügl:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I added another patch. I missed to manually add one test
file to
> >>> version
> >>>>>>> control, and there are still duplicate lines.
> >>>>>>> I hope this patch fixes the remaining problems.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>>
> >>>>>>> Peter
> >>>>>>>
> >>>>>>>
> >>>>>>> Am 29.01.2016 um 10:34 schrieb Peter Klügl:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> the problems were caused by the svn client in my Eclipse.
Sorry
> for
> >>> the
> >>>>>>>> trouble, I should have looked more closely at the ciomplete
patch.
> >>>>>>>>
> >>>>>>>> I attached a new patch created with commandline tools
wich looks
> >>>>> correct
> >>>>>>>> now.
> >>>>>>>>
> >>>>>>>> Pei, can you apply the new patch?
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>>
> >>>>>>>> Peter
> >>>>>>>>
> >>>>>>>> Am 28.01.2016 um 15:57 schrieb Peter Klügl:
> >>>>>>>>> Thanks Pei.
> >>>>>>>>>
> >>>>>>>>> I fear there was again a problem with the patch.
All new files
> are
> >>>>>>>>> missing (and also the svn-ignore settings).
> >>>>>>>>>
> >>>>>>>>> Can you take a look?
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>>
> >>>>>>>>> Peter
> >>>>>>>>>
> >>>>>>>>> Am 28.01.2016 um 14:43 schrieb Pei Chen:
> >>>>>>>>>> patch applied.
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Pei
> >>>>>>>>>>
> >>>>>>>>>> On Thu, Jan 28, 2016 at 4:14 AM, Peter Klügl
<
> >>>>> peter.kluegl@averbis.com> wrote:
> >>>>>>>>>>> Hi Pei,
> >>>>>>>>>>>
> >>>>>>>>>>> can you commit the recent patch for us?
> >>>>>>>>>>>
> >>>>>>>>>>> CTAKES-384-20160120.patch
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>>
> >>>>>>>>>>> Peter
> >>>>>>>>>>>
> >>>>>>>>>>> Am 20.01.2016 um 19:35 schrieb Pei Chen:
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>> Sorry I was swamped recently.
> >>>>>>>>>>>> But yeah, we can even create an extended
type system to store
> >>>>> these items temporarily and add them into the main/core type system
> >>>>> afterwards.
> >>>>>>>>>>>> There was an existing item to upgrade
UIMA, but agreed- it
> will
> >>>>> require much more testing.  If it works, we can upgrade it in our
> >>> sandbox
> >>>>> area or create a branch if necessary.
> >>>>>>>>>>>> —Pei
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Jan 18, 2016, at 9:06 AM, Peter
Klügl <
> >>>>> peter.kluegl@averbis.com> wrote:
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> a new patch is attached.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> @Pei:
> >>>>>>>>>>>>> are there suitable annotation types
in the cTAKES type
> system?
> >>>>> Some
> >>>>>>>>>>>>> project in cTAKES uses something
like OntologyMatch... I map
> it
> >>> to
> >>>>>>>>>>>>> IdentifiedAnnotation right now,
but there are many empty
> >>>>> features...
> >>>>>>>>>>>>> @Azad:
> >>>>>>>>>>>>> I changed the rules a bit, especially
the capitalization
> like I
> >>>>> use it
> >>>>>>>>>>>>> in ruta normally. The wordlist are
compiled to a trie by the
> >>> maven
> >>>>>>>>>>>>> plugin. I also added the two regexes
for url and email. I
> >>>>> extended the
> >>>>>>>>>>>>> regex for the url. I also changed
the evaluation order of
> some
> >>>>> rules
> >>>>>>>>>>>>> (with @). Feel free to add simple
examples to examples.csv
> for
> >>>>> the unit
> >>>>>>>>>>>>> tests.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Let me know if you need more information
about the changes.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Do you wanna have help with the
other rule sets? Or should we
> >>>>> split them up?
> >>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Peter
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Am 18.01.2016 um 11:04 schrieb Peter
Klügl:
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> great. I will integrate them
in the project and in the next
> >>>>> patch.
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Peter
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Am 18.01.2016 um 00:58 schrieb
Azad Dehghan:
> >>>>>>>>>>>>>>> Three NERs translated and
uploaded.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> PS. I will validate all
NERs once we have them all
> completed.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>> Azad
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On 24 November 2015 at 10:37,
Azad Dehghan <
> >>>>> azad.dehghan@gmail.com> wrote:
> >>>>>>>>>>>>>>>> This is on my todo list
for Dec. as well. If there are any
> >>>>> more volunteers
> >>>>>>>>>>>>>>>> for translating JAPE
to RUTA, please get in touch.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>> Azad
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 24 Nov 2015 09:55,
"Peter Klügl" <
> >>> peter.kluegl@averbis.com>
> >>>>> wrote:
> >>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I just wanted to
mention that I haven't forgot about it.
> >>>>> Unfortunately,
> >>>>>>>>>>>>>>>>> there is just no
spare time right now. I hope I will be
> able
> >>>>> to provide
> >>>>>>>>>>>>>>>>> the patches in December.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Peter
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Am 06.11.2015 um
16:40 schrieb Pei Chen:
> >>>>>>>>>>>>>>>>>> Hi Peter,
> >>>>>>>>>>>>>>>>>> I think the
ctakes-examples is probably a good starting
> >>>>> point at least
> >>>>>>>>>>>>>>>>>> in terms of
maven modules, etc.  I think it would be
> good
> >>> if
> >>>>> we use
> >>>>>>>>>>>>>>>>>> uimaFIT style
as primary approach to wiring components
> >>>>> together and
> >>>>>>>>>>>>>>>>>> generate desc's
as secondary...
> >>>>>>>>>>>>>>>>>> I think the
actual components that would be required is
> >>>>> probably best
> >>>>>>>>>>>>>>>>>> left up to what
is actually required for best performing
> >>>>> c-deid.  The
> >>>>>>>>>>>>>>>>>> output would
be interesting, I'm not sure if we should
> >>> treat
> >>>>> this as
> >>>>>>>>>>>>>>>>>> an independent
preprocessing component or part of a
> >>> pipeline
> >>>>> (in which
> >>>>>>>>>>>>>>>>>> case, we may
need to propose a change to the type
> system or
> >>>>> perhaps an
> >>>>>>>>>>>>>>>>>> alternative
JCas view.  You can probably open up that
> >>>>> discussion to
> >>>>>>>>>>>>>>>>>> the dev group
as you see fit.)
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> My 2 cents...
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Fri, Nov
6, 2015 at 3:38 AM, Peter Klügl <
> >>>>> peter.kluegl@averbis.com>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Is there
a cTAKES project that may serve as an example
> on
> >>>>> how the
> >>>>>>>>>>>>>>>> cTAKES
> >>>>>>>>>>>>>>>>>>> community
develops or how a project should look like?
> >>>>>>>>>>>>>>>>>>> I learned
that different people set up UIMA project in
> a
> >>>>> quite
> >>>>>>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>> manner and
I do not what to get inspired by "some sort
> of
> >>>>> out-dated"
> >>>>>>>>>>>>>>>>>>> approach
in the cTAKES repo.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Are there
restriction or preferences about the
> >>> preprocessing
> >>>>>>>>>>>>>>>> components
> >>>>>>>>>>>>>>>>>>> that should
be used and the kind of "output" of the
> >>> project.
> >>>>>>>>>>>>>>>>>>> Components:
On which components may the componetns
> rely:
> >>>>> tokenizer,
> >>>>>>>>>>>>>>>> ...
> >>>>>>>>>>>>>>>>>>> parser,
... dict lookup?
> >>>>>>>>>>>>>>>>>>> "output":
Should the project provide a pipeline or a
> >>> single
> >>>>> AE?
> >>>>>>>>>>>>>>>>>>> More comments
below.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Am 03.11.2015
um 16:54 schrieb Azad Dehghan:
> >>>>>>>>>>>>>>>>>>>>>
Who else plans to provide patches for it? Just to
> avoid
> >>>>> duplicate
> >>>>>>>>>>>>>>>> work
> >>>>>>>>>>>>>>>>>>>>>
and to coordnate the efforts ...
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I would
like to help with the translating JAPE to
> RUTA.
> >>>>>>>>>>>>>>>>>>> You can
already go ahead with the UIMA Ruta Workbench
> if
> >>>>> you want, or
> >>>>>>>>>>>>>>>>>>> wait until
I set up the project with ruta integration.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> If any questions
arise, just ask :-)
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
Is there a development dataset which was utilized for
> >>> the
> >>>>> initial
> >>>>>>>>>>>>>>>>>>>>>
development, and if yes, is it possible to
> contribute it
> >>>>> too?
> >>>>>>>>>>>>>>>>>>>> The
data set is unfortunately not publicly available;
> >>> i2b2
> >>>>>>>>>>>>>>>>>>>> <https://www.i2b2.org/NLP/DataSets/Main.php>
> typically
> >>>>> releases the
> >>>>>>>>>>>>>>>> data
> >>>>>>>>>>>>>>>>>>>> sets
12 months after a given challenge; this is done
> on
> >>> an
> >>>>>>>>>>>>>>>> individual basis
> >>>>>>>>>>>>>>>>>>>> and
involve a Data Use Agreement.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> However,
I will be able to conduct and coordinate the
> >>>>> validation.
> >>>>>>>>>>>>>>>>>>> Ok, I'll
investigate if we have already access to the
> >>>>> dataset here.
> >>>>>>>>>>>>>>>>>>>>>
My first step would be:
> >>>>>>>>>>>>>>>>>>>>>
- set up a maven project
> >>>>>>>>>>>>>>>>>>>>>
- set up a development pipeline in a test (with
> cTAKES
> >>>>> components
> >>>>>>>>>>>>>>>>>>>>>
replacing the previous ANNIE preprocessing)
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
But one item that we need to review is the 3rd party
> >>> libs
> >>>>> jars that
> >>>>>>>>>>>>>>>>>>>>>
were included to ensure compatibility.  I’ll be sure
> to
> >>>>> take a look
> >>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>
that over the next few weeks.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
—Pei
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> @Pei
- once ANNIE components are replaced there is
> should
> >>>>> not be a
> >>>>>>>>>>>>>>>> need to
> >>>>>>>>>>>>>>>>>>>> worry
about the 3rd party libs.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Also,
just a thought: we may want to create an
> >>> independent
> >>>>> component
> >>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>> the
Two Pass recognition (TwoPass.java) as this method
> >>>>> have shown
> >>>>>>>>>>>>>>>> useful
> >>>>>>>>>>>>>>>>>>>> for
general NER on longitudinal data and surely useful
> >>>>> independent
> >>>>>>>>>>>>>>>> of the
> >>>>>>>>>>>>>>>>>>>> deid
component.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>> Azad
> >>>>>>>>>>>>>>>>>>>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message