ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azad Dehghan <azad.dehg...@gmail.com>
Subject RE: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
Date Fri, 11 Mar 2016 10:25:09 GMT
MIST did very well in the 2006 i2b2 challange on a very limited set of PHI
entity types. The 2014 challange evaluated a more comprehensive set of PHIs
with a number of different methods being propsed.

The issue of PHI leaks is an interesting one that keeps reoccurring. I
cannot see clinical data being released as 'open data' without additional
safe guards such as data use agreement etc. Also, it is good that cTAKES
has started to take onboard deid as the removal of PHI remains a hurdle for
clinical data access in increasing number of institutions, after all it is
a problem that has a NLP solution.

Azad
On 10 Mar 2016 21:27, "Savova, Guergana" <
Guergana.Savova@childrens.harvard.edu> wrote:

> You can re-build the models that feed into MIST. I personally would not
> use the default model that MIST comes with as it is not trained on clinical
> data. In our previous work we found that hand-annotating about 200 docs for
> PHI (representative of the sample you are going to run the models on)
> results in building a pretty good model - in the 90's for p, r and f1.
> However, even with that high performance, the institution that owns the
> data might be still reluctant to share as it might pose a violation of
> HIPAA through some potential PHI leaks. In cTAKES our approach has been to
> de-couple the de-identifcation from the NLP/information extraction. If a
> user has the need for de-identified data, they could choose their method --
> manual or otherwise -- and then process through cTAKES. Our focus is the
> NLP/IE space, while de-identification is a blend of that plus policy....
>
> --Guergana
>
> -----Original Message-----
> From: Azad Dehghan [mailto:azad.dehghan@gmail.com]
> Sent: Thursday, March 10, 2016 4:19 PM
> To: dev@ctakes.apache.org
> Subject: RE: Combining Knowledge- and Data-driven Methods for
> De-identification of Clinical Narratives
>
> Thanks Guergana.
>
> > Yes, the current release of cTAKES has a module for the temporal
> expressions which includes dates. The normalizer for the temporal
> expressions is Steven Bethard's timenorm code.
> >
>
> Great.
>
> > However, if you do de-identification of dates/temporal expressions,
> > you
> run the risk of creating incorrect timelines as many of the relative
> temporal expressions (e.g. spring of this year, x-mas time, etc.) are
> unlikely to be correctly shifted by any de-identification tool.
> >
> Indeed, a reason I have not included the dates component.
>
> > One de-identification tool is MIST --
> https://urldefense.proofpoint.com/v2/url?u=http-3A__mist-2Ddeid.sourceforge.net_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=FlURWGr18rKbgM76o8Hxoo1rbC2D2h-kk611lbKnPik&s=5awdXn2I-hRE0-161tqFDGgmYgQQviQg360uHI4fs2s&e=
> .
> >
> I don't remember them doing well in the community held evaluation in 2014.
> Hence, cDeid :)
> >
> > Guergana Savova, PhD, FACMI
> > Associate Professor
> > PI Natural Language Processing Lab
> > Boston Children's Hospital and Harvard Medical School
> > 300 Longwood Avenue
> > Mailstop: BCH3092
> > Enders 144.1
> > Boston, MA 02115
> > Tel: (617) 919-2972
> > Fax: (617) 730-0817
> > Harvard Scholar:
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__scholar.harvard.ed
> > u_guergana-5Fk-5Fsavova_biocv&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14J
> > ZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGm
> > RCJNAr-rCmP&m=FlURWGr18rKbgM76o8Hxoo1rbC2D2h-kk611lbKnPik&s=3taiTxFp55
> > iQUnc6A6Yemg-XzFQrRjo5QZRQeKHQ29c&e=
> >
> > -----Original Message-----
> > From: Azad Dehghan [mailto:azad.dehghan@gmail.com]
> > Sent: Thursday, March 10, 2016 3:42 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: Combining Knowledge- and Data-driven Methods for
> De-identification of Clinical Narratives
> >
> > > This means both training data folders? I have access to the data but
> > > not
> > to the challenge description.
> >
> > Yes. Is there any specific information that you are missing?
> > >
> > >
> > >> It would be good to incorporate/refactor (basically, GATE API needs
> > >> to be replaced with UIMA API to generate annotation) the two-pass
> > >> recognition method for cTAKES - which has a wider application on
> longitudinal data.
> > >> This method is used on-top of a number NERs.
> > >
> > >
> > > I'll take a look.
> > >
> > > I do not know how much time I can invest this month. Let's see how
> > > many
> > phases I can translate.
> > >
> > > I added the rules for age. Are there jape rules for creating date
> > annotations?
> > >
> >
> > No. I believe cTAKES has existing component(s) to capture dates?
> >
> > > After all rules are translated, they need some major refactoring.
> > > Jape
> > and Ruta are quite different in some aspects.
> > >
> > Ok.
> >
> > >
> > >
> > >
> > >
> > >
> > >> Please let me know where I can help. I will be available again in
> April.
> > >>
> > >> Cheers,
> > >> Azad
> > >>
> > >> On 10 March 2016 at 13:13, Peter Klügl <peter.kluegl@averbis.com>
> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> sorry, I was quite busy last month.
> > >>>
> > >>> I added a new patch, which needs to be applied.
> > >>>
> > >>> No new rules, but it's possible now to evaluate everything against
> > >>> the labelled data of the challenge.
> > >>>
> > >>> @Azad:
> > >>> Which documents exactly did you use to develop the rules?
> > >>> training-PHI-Gold-Set1, training-PHI-Gold-Set2 or
> > testing-PHI-Gold-fixed?
> > >>>
> > >>> Best,
> > >>>
> > >>> Peter
> > >>>
> > >>> Am 03.02.2016 um 09:05 schrieb Peter Klügl:
> > >>>>
> > >>>> Hi,
> > >>>>
> > >>>> the last patch fixed almost all problems.
> > >>>>
> > >>>> I added another one that adds the csv file for the unit test and
> > extends
> > >>>> svn-ignore.
> > >>>>
> > >>>> Best,
> > >>>>
> > >>>> Peter
> > >>>>
> > >>>> Am 02.02.2016 um 09:16 schrieb Peter Klügl:
> > >>>>>
> > >>>>> Hi,
> > >>>>>
> > >>>>> I added another patch. I missed to manually add one test file
to
> > version
> > >>>>> control, and there are still duplicate lines.
> > >>>>> I hope this patch fixes the remaining problems.
> > >>>>>
> > >>>>> Best,
> > >>>>>
> > >>>>> Peter
> > >>>>>
> > >>>>>
> > >>>>> Am 29.01.2016 um 10:34 schrieb Peter Klügl:
> > >>>>>>
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>> the problems were caused by the svn client in my Eclipse.
Sorry
> > >>>>>> for
> > the
> > >>>>>> trouble, I should have looked more closely at the ciomplete
patch.
> > >>>>>>
> > >>>>>> I attached a new patch created with commandline tools wich
> > >>>>>> looks
> > >>>
> > >>> correct
> > >>>>>>
> > >>>>>> now.
> > >>>>>>
> > >>>>>> Pei, can you apply the new patch?
> > >>>>>>
> > >>>>>> Best,
> > >>>>>>
> > >>>>>> Peter
> > >>>>>>
> > >>>>>> Am 28.01.2016 um 15:57 schrieb Peter Klügl:
> > >>>>>>>
> > >>>>>>> Thanks Pei.
> > >>>>>>>
> > >>>>>>> I fear there was again a problem with the patch. All
new files
> > >>>>>>> are missing (and also the svn-ignore settings).
> > >>>>>>>
> > >>>>>>> Can you take a look?
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>>
> > >>>>>>> Peter
> > >>>>>>>
> > >>>>>>> Am 28.01.2016 um 14:43 schrieb Pei Chen:
> > >>>>>>>>
> > >>>>>>>> patch applied.
> > >>>>>>>> Thanks,
> > >>>>>>>> Pei
> > >>>>>>>>
> > >>>>>>>> On Thu, Jan 28, 2016 at 4:14 AM, Peter Klügl <
> > >>>
> > >>> peter.kluegl@averbis.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>> Hi Pei,
> > >>>>>>>>>
> > >>>>>>>>> can you commit the recent patch for us?
> > >>>>>>>>>
> > >>>>>>>>> CTAKES-384-20160120.patch
> > >>>>>>>>>
> > >>>>>>>>> Best,
> > >>>>>>>>>
> > >>>>>>>>> Peter
> > >>>>>>>>>
> > >>>>>>>>> Am 20.01.2016 um 19:35 schrieb Pei Chen:
> > >>>>>>>>>>
> > >>>>>>>>>> Hi,
> > >>>>>>>>>> Sorry I was swamped recently.
> > >>>>>>>>>> But yeah, we can even create an extended
type system to
> > >>>>>>>>>> store
> > >>>
> > >>> these items temporarily and add them into the main/core type
> > >>> system afterwards.
> > >>>>>>>>>>
> > >>>>>>>>>> There was an existing item to upgrade UIMA,
but agreed- it
> > >>>>>>>>>> will
> > >>>
> > >>> require much more testing.  If it works, we can upgrade it in our
> > sandbox
> > >>> area or create a branch if necessary.
> > >>>>>>>>>>
> > >>>>>>>>>> —Pei
> > >>>>>>>>>>
> > >>>>>>>>>>> On Jan 18, 2016, at 9:06 AM, Peter
Klügl <
> > >>>
> > >>> peter.kluegl@averbis.com> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>> Hi,
> > >>>>>>>>>>>
> > >>>>>>>>>>> a new patch is attached.
> > >>>>>>>>>>>
> > >>>>>>>>>>> @Pei:
> > >>>>>>>>>>> are there suitable annotation types
in the cTAKES type
> system?
> > >>>
> > >>> Some
> > >>>>>>>>>>>
> > >>>>>>>>>>> project in cTAKES uses something like
OntologyMatch... I
> > >>>>>>>>>>> map it
> > to
> > >>>>>>>>>>> IdentifiedAnnotation right now, but
there are many empty
> > >>>
> > >>> features...
> > >>>>>>>>>>>
> > >>>>>>>>>>> @Azad:
> > >>>>>>>>>>> I changed the rules a bit, especially
the capitalization
> > >>>>>>>>>>> like I
> > >>>
> > >>> use it
> > >>>>>>>>>>>
> > >>>>>>>>>>> in ruta normally. The wordlist are
compiled to a trie by
> > >>>>>>>>>>> the
> > maven
> > >>>>>>>>>>> plugin. I also added the two regexes
for url and email. I
> > >>>
> > >>> extended the
> > >>>>>>>>>>>
> > >>>>>>>>>>> regex for the url. I also changed the
evaluation order of
> > >>>>>>>>>>> some
> > >>>
> > >>> rules
> > >>>>>>>>>>>
> > >>>>>>>>>>> (with @). Feel free to add simple examples
to examples.csv
> > >>>>>>>>>>> for
> > >>>
> > >>> the unit
> > >>>>>>>>>>>
> > >>>>>>>>>>> tests.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Let me know if you need more information
about the changes.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Do you wanna have help with the other
rule sets? Or should
> > >>>>>>>>>>> we
> > >>>
> > >>> split them up?
> > >>>>>>>>>>>
> > >>>>>>>>>>> Best,
> > >>>>>>>>>>>
> > >>>>>>>>>>> Peter
> > >>>>>>>>>>>
> > >>>>>>>>>>> Am 18.01.2016 um 11:04 schrieb Peter
Klügl:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> great. I will integrate them in
the project and in the
> > >>>>>>>>>>>> next
> > >>>
> > >>> patch.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Best,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Peter
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Am 18.01.2016 um 00:58 schrieb
Azad Dehghan:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Three NERs translated and uploaded.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> PS. I will validate all NERs
once we have them all
> completed.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>> Azad
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On 24 November 2015 at 10:37,
Azad Dehghan <
> > >>>
> > >>> azad.dehghan@gmail.com> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> This is on my todo list
for Dec. as well. If there are
> > >>>>>>>>>>>>>> any
> > >>>
> > >>> more volunteers
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> for translating JAPE to
RUTA, please get in touch.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>> Azad
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On 24 Nov 2015 09:55, "Peter
Klügl"
> > >>>>>>>>>>>>>> <peter.kluegl@averbis.com
> > >
> > >>>
> > >>> wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> I just wanted to mention
that I haven't forgot about it.
> > >>>
> > >>> Unfortunately,
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> there is just no spare
time right now. I hope I will
> > >>>>>>>>>>>>>>> be able
> > >>>
> > >>> to provide
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> the patches in December.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Peter
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Am 06.11.2015 um 16:40
schrieb Pei Chen:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Hi Peter,
> > >>>>>>>>>>>>>>>> I think the ctakes-examples
is probably a good
> > >>>>>>>>>>>>>>>> starting
> > >>>
> > >>> point at least
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> in terms of maven
modules, etc.  I think it would be
> > >>>>>>>>>>>>>>>> good
> > if
> > >>>
> > >>> we use
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> uimaFIT style as
primary approach to wiring
> > >>>>>>>>>>>>>>>> components
> > >>>
> > >>> together and
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> generate desc's
as secondary...
> > >>>>>>>>>>>>>>>> I think the actual
components that would be required
> > >>>>>>>>>>>>>>>> is
> > >>>
> > >>> probably best
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> left up to what
is actually required for best
> > >>>>>>>>>>>>>>>> performing
> > >>>
> > >>> c-deid.  The
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> output would be
interesting, I'm not sure if we
> > >>>>>>>>>>>>>>>> should
> > treat
> > >>>
> > >>> this as
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> an independent
preprocessing component or part of a
> > pipeline
> > >>>
> > >>> (in which
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> case, we may need
to propose a change to the type
> > >>>>>>>>>>>>>>>> system or
> > >>>
> > >>> perhaps an
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> alternative JCas
view.  You can probably open up that
> > >>>
> > >>> discussion to
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> the dev group as
you see fit.)
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> My 2 cents...
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On Fri, Nov 6,
2015 at 3:38 AM, Peter Klügl <
> > >>>
> > >>> peter.kluegl@averbis.com>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Is there a
cTAKES project that may serve as an
> > >>>>>>>>>>>>>>>>> example on
> > >>>
> > >>> how the
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> cTAKES
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> community develops
or how a project should look like?
> > >>>>>>>>>>>>>>>>> I learned that
different people set up UIMA project
> > >>>>>>>>>>>>>>>>> in a
> > >>>
> > >>> quite
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> different
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> manner and
I do not what to get inspired by "some
> > >>>>>>>>>>>>>>>>> sort of
> > >>>
> > >>> out-dated"
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> approach in
the cTAKES repo.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Are there restriction
or preferences about the
> > preprocessing
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> components
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> that should
be used and the kind of "output" of the
> > project.
> > >>>>>>>>>>>>>>>>> Components:
On which components may the componetns
> rely:
> > >>>
> > >>> tokenizer,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> ...
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> parser, ...
dict lookup?
> > >>>>>>>>>>>>>>>>> "output": Should
the project provide a pipeline or a
> > single
> > >>>
> > >>> AE?
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> More comments
below.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Am 03.11.2015
um 16:54 schrieb Azad Dehghan:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Who
else plans to provide patches for it? Just to
> > >>>>>>>>>>>>>>>>>>> avoid
> > >>>
> > >>> duplicate
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> work
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> and
to coordnate the efforts ...
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> I would
like to help with the translating JAPE to
> RUTA.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> You can already
go ahead with the UIMA Ruta
> > >>>>>>>>>>>>>>>>> Workbench if
> > >>>
> > >>> you want, or
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> wait until
I set up the project with ruta integration.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> If any questions
arise, just ask :-)
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Is
there a development dataset which was utilized
> > >>>>>>>>>>>>>>>>>>> for
> > the
> > >>>
> > >>> initial
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> development,
and if yes, is it possible to
> > >>>>>>>>>>>>>>>>>>> contribute
it
> > >>>
> > >>> too?
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> The data
set is unfortunately not publicly
> > >>>>>>>>>>>>>>>>>> available;
> > i2b2
> > >>>>>>>>>>>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3
> > >>>>>>>>>>>>>>>>>> A_
> > >>>>>>>>>>>>>>>>>> _www.i2b2.org_NLP_DataSets_Main.php&d=BQIFaQ&c=qS4g
> > >>>>>>>>>>>>>>>>>> oW
> > >>>>>>>>>>>>>>>>>> BT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNn
> > >>>>>>>>>>>>>>>>>> J9
> > >>>>>>>>>>>>>>>>>> mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP
> > >>>>>>>>>>>>>>>>>> &m
> > >>>>>>>>>>>>>>>>>> =1Qpd4A2PgVD13w31PkkvmJf6I0PTCatCzgBgsnetPOg&s=aAEe
> > >>>>>>>>>>>>>>>>>> OR yMtz7NCv-6EEgiABVY_Rf6zLnJghQh2DA_CKQ&e=
>
> > >>>>>>>>>>>>>>>>>> typically
> > >>>
> > >>> releases the
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> data
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> sets 12
months after a given challenge; this is
> > >>>>>>>>>>>>>>>>>> done on
> > an
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> individual basis
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> and involve
a Data Use Agreement.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> However,
I will be able to conduct and coordinate
> > >>>>>>>>>>>>>>>>>> the
> > >>>
> > >>> validation.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Ok, I'll investigate
if we have already access to
> > >>>>>>>>>>>>>>>>> the
> > >>>
> > >>> dataset here.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> My
first step would be:
> > >>>>>>>>>>>>>>>>>>> - set
up a maven project
> > >>>>>>>>>>>>>>>>>>> - set
up a development pipeline in a test (with
> > >>>>>>>>>>>>>>>>>>> cTAKES
> > >>>
> > >>> components
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> replacing
the previous ANNIE preprocessing)
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> But
one item that we need to review is the 3rd
> > >>>>>>>>>>>>>>>>>>> party
> > libs
> > >>>
> > >>> jars that
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> were
included to ensure compatibility.  I’ll be
> > >>>>>>>>>>>>>>>>>>> sure
to
> > >>>
> > >>> take a look
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> at
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> that
over the next few weeks.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> —Pei
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> @Pei -
once ANNIE components are replaced there is
> > >>>>>>>>>>>>>>>>>> should
> > >>>
> > >>> not be a
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> need to
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> worry about
the 3rd party libs.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Also, just
a thought: we may want to create an
> > independent
> > >>>
> > >>> component
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> the Two
Pass recognition (TwoPass.java) as this
> > >>>>>>>>>>>>>>>>>> method
> > >>>
> > >>> have shown
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> useful
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> for general
NER on longitudinal data and surely
> > >>>>>>>>>>>>>>>>>> useful
> > >>>
> > >>> independent
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> of the
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> deid component.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>>>>>> Azad
> > >>>>>>>>>>>>>>>>>>
> > >>>
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message