ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <peter.klu...@averbis.com>
Subject Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
Date Fri, 11 Mar 2016 10:32:50 GMT
Hi,

thanks for the notes and links, Andy and Guergana. The software and
articles are very interesting, but, as for my personal interest, we have
our own clinical deidentification software solution at our company
(which works good enough as far as I know). My focus is rather on
helping out in translating the contribution from GATE/JAPE to UIMA/Ruta.
Thus, I concentrate on the existing functionality for now.

What is the final goal of the cTAKES comunity concerning clinical deid
components? Will both sandbox projects be merged, what about statistical
approaches?

@Pei: there was again a problem with the patch (I also missed to add
some files). I attached a new one.

@Azad: I am just curious on which data the rules exactly rely. I think
I'll find the information in the article.
I assume that the 521 docuemnts have been utilized to develop the rules
and the 269 documents to evaluate them. Did you correct the rules also
using the second set? I need to reread to article :-)

Best,

Peter


Am 10.03.2016 um 23:22 schrieb andy mcmurry:
> *** For cross-validation, you can evaluate de-identified notes data from
> i2b2 challenge** *
> https://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-scrubber-deid/data/models/
>
> *Methods for model generation of FeatureSet described here: *
>
> *Improved de-identification of physician notes through integrative modeling
> of both public and private medical text*
> http://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/1472-6947-13-112
>
> Major objective of that study was to help provide external examples to
> cross train / retrain other methods.
>
> hope this helps,
> --Andy
>
>
>
> On Thu, Mar 10, 2016 at 1:27 PM, Savova, Guergana <
> Guergana.Savova@childrens.harvard.edu> wrote:
>
>> You can re-build the models that feed into MIST. I personally would not
>> use the default model that MIST comes with as it is not trained on clinical
>> data. In our previous work we found that hand-annotating about 200 docs for
>> PHI (representative of the sample you are going to run the models on)
>> results in building a pretty good model - in the 90's for p, r and f1.
>> However, even with that high performance, the institution that owns the
>> data might be still reluctant to share as it might pose a violation of
>> HIPAA through some potential PHI leaks. In cTAKES our approach has been to
>> de-couple the de-identifcation from the NLP/information extraction. If a
>> user has the need for de-identified data, they could choose their method --
>> manual or otherwise -- and then process through cTAKES. Our focus is the
>> NLP/IE space, while de-identification is a blend of that plus policy....
>>
>> --Guergana
>>
>> -----Original Message-----
>> From: Azad Dehghan [mailto:azad.dehghan@gmail.com]
>> Sent: Thursday, March 10, 2016 4:19 PM
>> To: dev@ctakes.apache.org
>> Subject: RE: Combining Knowledge- and Data-driven Methods for
>> De-identification of Clinical Narratives
>>
>> Thanks Guergana.
>>
>>> Yes, the current release of cTAKES has a module for the temporal
>> expressions which includes dates. The normalizer for the temporal
>> expressions is Steven Bethard's timenorm code.
>> Great.
>>
>>> However, if you do de-identification of dates/temporal expressions,
>>> you
>> run the risk of creating incorrect timelines as many of the relative
>> temporal expressions (e.g. spring of this year, x-mas time, etc.) are
>> unlikely to be correctly shifted by any de-identification tool.
>> Indeed, a reason I have not included the dates component.
>>
>>> One de-identification tool is MIST --
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__mist-2Ddeid.sourceforge.net_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=FlURWGr18rKbgM76o8Hxoo1rbC2D2h-kk611lbKnPik&s=5awdXn2I-hRE0-161tqFDGgmYgQQviQg360uHI4fs2s&e=
>> .
>> I don't remember them doing well in the community held evaluation in 2014.
>> Hence, cDeid :)
>>> Guergana Savova, PhD, FACMI
>>> Associate Professor
>>> PI Natural Language Processing Lab
>>> Boston Children's Hospital and Harvard Medical School
>>> 300 Longwood Avenue
>>> Mailstop: BCH3092
>>> Enders 144.1
>>> Boston, MA 02115
>>> Tel: (617) 919-2972
>>> Fax: (617) 730-0817
>>> Harvard Scholar:
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__scholar.harvard.ed
>>> u_guergana-5Fk-5Fsavova_biocv&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14J
>>> ZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGm
>>> RCJNAr-rCmP&m=FlURWGr18rKbgM76o8Hxoo1rbC2D2h-kk611lbKnPik&s=3taiTxFp55
>>> iQUnc6A6Yemg-XzFQrRjo5QZRQeKHQ29c&e=
>>>
>>> -----Original Message-----
>>> From: Azad Dehghan [mailto:azad.dehghan@gmail.com]
>>> Sent: Thursday, March 10, 2016 3:42 PM
>>> To: dev@ctakes.apache.org
>>> Subject: Re: Combining Knowledge- and Data-driven Methods for
>> De-identification of Clinical Narratives
>>>> This means both training data folders? I have access to the data but
>>>> not
>>> to the challenge description.
>>>
>>> Yes. Is there any specific information that you are missing?
>>>>
>>>>> It would be good to incorporate/refactor (basically, GATE API needs
>>>>> to be replaced with UIMA API to generate annotation) the two-pass
>>>>> recognition method for cTAKES - which has a wider application on
>> longitudinal data.
>>>>> This method is used on-top of a number NERs.
>>>>
>>>> I'll take a look.
>>>>
>>>> I do not know how much time I can invest this month. Let's see how
>>>> many
>>> phases I can translate.
>>>> I added the rules for age. Are there jape rules for creating date
>>> annotations?
>>> No. I believe cTAKES has existing component(s) to capture dates?
>>>
>>>> After all rules are translated, they need some major refactoring.
>>>> Jape
>>> and Ruta are quite different in some aspects.
>>> Ok.
>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Please let me know where I can help. I will be available again in
>> April.
>>>>> Cheers,
>>>>> Azad
>>>>>
>>>>> On 10 March 2016 at 13:13, Peter Klügl <peter.kluegl@averbis.com>
>> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> sorry, I was quite busy last month.
>>>>>>
>>>>>> I added a new patch, which needs to be applied.
>>>>>>
>>>>>> No new rules, but it's possible now to evaluate everything against
>>>>>> the labelled data of the challenge.
>>>>>>
>>>>>> @Azad:
>>>>>> Which documents exactly did you use to develop the rules?
>>>>>> training-PHI-Gold-Set1, training-PHI-Gold-Set2 or
>>> testing-PHI-Gold-fixed?
>>>>>> Best,
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>> Am 03.02.2016 um 09:05 schrieb Peter Klügl:
>>>>>>> Hi,
>>>>>>>
>>>>>>> the last patch fixed almost all problems.
>>>>>>>
>>>>>>> I added another one that adds the csv file for the unit test
and
>>> extends
>>>>>>> svn-ignore.
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>> Am 02.02.2016 um 09:16 schrieb Peter Klügl:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I added another patch. I missed to manually add one test
file to
>>> version
>>>>>>>> control, and there are still duplicate lines.
>>>>>>>> I hope this patch fixes the remaining problems.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Peter
>>>>>>>>
>>>>>>>>
>>>>>>>> Am 29.01.2016 um 10:34 schrieb Peter Klügl:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> the problems were caused by the svn client in my Eclipse.
Sorry
>>>>>>>>> for
>>> the
>>>>>>>>> trouble, I should have looked more closely at the ciomplete
patch.
>>>>>>>>>
>>>>>>>>> I attached a new patch created with commandline tools
wich
>>>>>>>>> looks
>>>>>> correct
>>>>>>>>> now.
>>>>>>>>>
>>>>>>>>> Pei, can you apply the new patch?
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>>
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>> Am 28.01.2016 um 15:57 schrieb Peter Klügl:
>>>>>>>>>> Thanks Pei.
>>>>>>>>>>
>>>>>>>>>> I fear there was again a problem with the patch.
All new files
>>>>>>>>>> are missing (and also the svn-ignore settings).
>>>>>>>>>>
>>>>>>>>>> Can you take a look?
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>>
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>> Am 28.01.2016 um 14:43 schrieb Pei Chen:
>>>>>>>>>>> patch applied.
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Pei
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jan 28, 2016 at 4:14 AM, Peter Klügl
<
>>>>>> peter.kluegl@averbis.com> wrote:
>>>>>>>>>>>> Hi Pei,
>>>>>>>>>>>>
>>>>>>>>>>>> can you commit the recent patch for us?
>>>>>>>>>>>>
>>>>>>>>>>>> CTAKES-384-20160120.patch
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>>
>>>>>>>>>>>> Peter
>>>>>>>>>>>>
>>>>>>>>>>>> Am 20.01.2016 um 19:35 schrieb Pei Chen:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> Sorry I was swamped recently.
>>>>>>>>>>>>> But yeah, we can even create an extended
type system to
>>>>>>>>>>>>> store
>>>>>> these items temporarily and add them into the main/core type
>>>>>> system afterwards.
>>>>>>>>>>>>> There was an existing item to upgrade
UIMA, but agreed- it
>>>>>>>>>>>>> will
>>>>>> require much more testing.  If it works, we can upgrade it in our
>>> sandbox
>>>>>> area or create a branch if necessary.
>>>>>>>>>>>>> —Pei
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Jan 18, 2016, at 9:06 AM, Peter
Klügl <
>>>>>> peter.kluegl@averbis.com> wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> a new patch is attached.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @Pei:
>>>>>>>>>>>>>> are there suitable annotation types
in the cTAKES type
>> system?
>>>>>> Some
>>>>>>>>>>>>>> project in cTAKES uses something
like OntologyMatch... I
>>>>>>>>>>>>>> map it
>>> to
>>>>>>>>>>>>>> IdentifiedAnnotation right now, but
there are many empty
>>>>>> features...
>>>>>>>>>>>>>> @Azad:
>>>>>>>>>>>>>> I changed the rules a bit, especially
the capitalization
>>>>>>>>>>>>>> like I
>>>>>> use it
>>>>>>>>>>>>>> in ruta normally. The wordlist are
compiled to a trie by
>>>>>>>>>>>>>> the
>>> maven
>>>>>>>>>>>>>> plugin. I also added the two regexes
for url and email. I
>>>>>> extended the
>>>>>>>>>>>>>> regex for the url. I also changed
the evaluation order of
>>>>>>>>>>>>>> some
>>>>>> rules
>>>>>>>>>>>>>> (with @). Feel free to add simple
examples to examples.csv
>>>>>>>>>>>>>> for
>>>>>> the unit
>>>>>>>>>>>>>> tests.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Let me know if you need more information
about the changes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do you wanna have help with the other
rule sets? Or should
>>>>>>>>>>>>>> we
>>>>>> split them up?
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 18.01.2016 um 11:04 schrieb Peter
Klügl:
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> great. I will integrate them
in the project and in the
>>>>>>>>>>>>>>> next
>>>>>> patch.
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Am 18.01.2016 um 00:58 schrieb
Azad Dehghan:
>>>>>>>>>>>>>>>> Three NERs translated and
uploaded.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> PS. I will validate all NERs
once we have them all
>> completed.
>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>> Azad
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 24 November 2015 at 10:37,
Azad Dehghan <
>>>>>> azad.dehghan@gmail.com> wrote:
>>>>>>>>>>>>>>>>> This is on my todo list
for Dec. as well. If there are
>>>>>>>>>>>>>>>>> any
>>>>>> more volunteers
>>>>>>>>>>>>>>>>> for translating JAPE
to RUTA, please get in touch.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>> Azad
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 24 Nov 2015 09:55,
"Peter Klügl"
>>>>>>>>>>>>>>>>> <peter.kluegl@averbis.com
>>>>>> wrote:
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I just wanted to
mention that I haven't forgot about it.
>>>>>> Unfortunately,
>>>>>>>>>>>>>>>>>> there is just no
spare time right now. I hope I will
>>>>>>>>>>>>>>>>>> be able
>>>>>> to provide
>>>>>>>>>>>>>>>>>> the patches in December.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Am 06.11.2015 um
16:40 schrieb Pei Chen:
>>>>>>>>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>>>>>>>> I think the ctakes-examples
is probably a good
>>>>>>>>>>>>>>>>>>> starting
>>>>>> point at least
>>>>>>>>>>>>>>>>>>> in terms of maven
modules, etc.  I think it would be
>>>>>>>>>>>>>>>>>>> good
>>> if
>>>>>> we use
>>>>>>>>>>>>>>>>>>> uimaFIT style
as primary approach to wiring
>>>>>>>>>>>>>>>>>>> components
>>>>>> together and
>>>>>>>>>>>>>>>>>>> generate desc's
as secondary...
>>>>>>>>>>>>>>>>>>> I think the actual
components that would be required
>>>>>>>>>>>>>>>>>>> is
>>>>>> probably best
>>>>>>>>>>>>>>>>>>> left up to what
is actually required for best
>>>>>>>>>>>>>>>>>>> performing
>>>>>> c-deid.  The
>>>>>>>>>>>>>>>>>>> output would
be interesting, I'm not sure if we
>>>>>>>>>>>>>>>>>>> should
>>> treat
>>>>>> this as
>>>>>>>>>>>>>>>>>>> an independent
preprocessing component or part of a
>>> pipeline
>>>>>> (in which
>>>>>>>>>>>>>>>>>>> case, we may
need to propose a change to the type
>>>>>>>>>>>>>>>>>>> system or
>>>>>> perhaps an
>>>>>>>>>>>>>>>>>>> alternative JCas
view.  You can probably open up that
>>>>>> discussion to
>>>>>>>>>>>>>>>>>>> the dev group
as you see fit.)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> My 2 cents...
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, Nov 6,
2015 at 3:38 AM, Peter Klügl <
>>>>>> peter.kluegl@averbis.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Is there
a cTAKES project that may serve as an
>>>>>>>>>>>>>>>>>>>> example on
>>>>>> how the
>>>>>>>>>>>>>>>>> cTAKES
>>>>>>>>>>>>>>>>>>>> community
develops or how a project should look like?
>>>>>>>>>>>>>>>>>>>> I learned
that different people set up UIMA project
>>>>>>>>>>>>>>>>>>>> in a
>>>>>> quite
>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>> manner and
I do not what to get inspired by "some
>>>>>>>>>>>>>>>>>>>> sort of
>>>>>> out-dated"
>>>>>>>>>>>>>>>>>>>> approach
in the cTAKES repo.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Are there
restriction or preferences about the
>>> preprocessing
>>>>>>>>>>>>>>>>> components
>>>>>>>>>>>>>>>>>>>> that should
be used and the kind of "output" of the
>>> project.
>>>>>>>>>>>>>>>>>>>> Components:
On which components may the componetns
>> rely:
>>>>>> tokenizer,
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>> parser, ...
dict lookup?
>>>>>>>>>>>>>>>>>>>> "output":
Should the project provide a pipeline or a
>>> single
>>>>>> AE?
>>>>>>>>>>>>>>>>>>>> More comments
below.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Am 03.11.2015
um 16:54 schrieb Azad Dehghan:
>>>>>>>>>>>>>>>>>>>>>> Who
else plans to provide patches for it? Just to
>>>>>>>>>>>>>>>>>>>>>> avoid
>>>>>> duplicate
>>>>>>>>>>>>>>>>> work
>>>>>>>>>>>>>>>>>>>>>> and
to coordnate the efforts ...
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I would
like to help with the translating JAPE to
>> RUTA.
>>>>>>>>>>>>>>>>>>>> You can already
go ahead with the UIMA Ruta
>>>>>>>>>>>>>>>>>>>> Workbench
if
>>>>>> you want, or
>>>>>>>>>>>>>>>>>>>> wait until
I set up the project with ruta integration.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> If any questions
arise, just ask :-)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Is
there a development dataset which was utilized
>>>>>>>>>>>>>>>>>>>>>> for
>>> the
>>>>>> initial
>>>>>>>>>>>>>>>>>>>>>> development,
and if yes, is it possible to
>>>>>>>>>>>>>>>>>>>>>> contribute
it
>>>>>> too?
>>>>>>>>>>>>>>>>>>>>> The data
set is unfortunately not publicly
>>>>>>>>>>>>>>>>>>>>> available;
>>> i2b2
>>>>>>>>>>>>>>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3
>>>>>>>>>>>>>>>>>>>>> A_
>>>>>>>>>>>>>>>>>>>>> _www.i2b2.org_NLP_DataSets_Main.php&d=BQIFaQ&c=qS4g
>>>>>>>>>>>>>>>>>>>>> oW
>>>>>>>>>>>>>>>>>>>>> BT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNn
>>>>>>>>>>>>>>>>>>>>> J9
>>>>>>>>>>>>>>>>>>>>> mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP
>>>>>>>>>>>>>>>>>>>>> &m
>>>>>>>>>>>>>>>>>>>>> =1Qpd4A2PgVD13w31PkkvmJf6I0PTCatCzgBgsnetPOg&s=aAEe
>>>>>>>>>>>>>>>>>>>>> OR yMtz7NCv-6EEgiABVY_Rf6zLnJghQh2DA_CKQ&e=
>
>>>>>>>>>>>>>>>>>>>>> typically
>>>>>> releases the
>>>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>>>>> sets
12 months after a given challenge; this is
>>>>>>>>>>>>>>>>>>>>> done
on
>>> an
>>>>>>>>>>>>>>>>> individual basis
>>>>>>>>>>>>>>>>>>>>> and involve
a Data Use Agreement.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> However,
I will be able to conduct and coordinate
>>>>>>>>>>>>>>>>>>>>> the
>>>>>> validation.
>>>>>>>>>>>>>>>>>>>> Ok, I'll
investigate if we have already access to
>>>>>>>>>>>>>>>>>>>> the
>>>>>> dataset here.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> My
first step would be:
>>>>>>>>>>>>>>>>>>>>>> -
set up a maven project
>>>>>>>>>>>>>>>>>>>>>> -
set up a development pipeline in a test (with
>>>>>>>>>>>>>>>>>>>>>> cTAKES
>>>>>> components
>>>>>>>>>>>>>>>>>>>>>> replacing
the previous ANNIE preprocessing)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> But
one item that we need to review is the 3rd
>>>>>>>>>>>>>>>>>>>>>> party
>>> libs
>>>>>> jars that
>>>>>>>>>>>>>>>>>>>>>> were
included to ensure compatibility.  I’ll be
>>>>>>>>>>>>>>>>>>>>>> sure
to
>>>>>> take a look
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>> that
over the next few weeks.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> —Pei
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> @Pei
- once ANNIE components are replaced there is
>>>>>>>>>>>>>>>>>>>>> should
>>>>>> not be a
>>>>>>>>>>>>>>>>> need to
>>>>>>>>>>>>>>>>>>>>> worry
about the 3rd party libs.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Also,
just a thought: we may want to create an
>>> independent
>>>>>> component
>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>> the Two
Pass recognition (TwoPass.java) as this
>>>>>>>>>>>>>>>>>>>>> method
>>>>>> have shown
>>>>>>>>>>>>>>>>> useful
>>>>>>>>>>>>>>>>>>>>> for general
NER on longitudinal data and surely
>>>>>>>>>>>>>>>>>>>>> useful
>>>>>> independent
>>>>>>>>>>>>>>>>> of the
>>>>>>>>>>>>>>>>>>>>> deid
component.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>> Azad
>>>>>>>>>>>>>>>>>>>>>


Mime
View raw message