ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geise, Brandon D." <bdge...@geisinger.edu>
Subject RE: MaxentParserWrapper
Date Tue, 08 Mar 2016 00:08:00 GMT
Hi Guergana,

Thanks.  That's what I thought but wanted to confirm.

Thanks,
Brandon


_____________________________
From: Savova, Guergana <guergana.savova@childrens.harvard.edu<mailto:guergana.savova@childrens.harvard.edu>>
Sent: Monday, March 7, 2016 6:58 PM
Subject: RE: MaxentParserWrapper
To: 'user@ctakes.apache.org<mailto:user@ctakes.apache.org>' <user@ctakes.apache.org<mailto:user@ctakes.apache.org>>


I believe it was the input from the dependency parser that is used as features for the assertion
modules. You can also substitute the assertion classifiers with the rule-based NegEx inspired
modules if you find the speed better that way.
--Guergana

From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
Sent: Monday, March 7, 2016 2:56 PM
To: user@ctakes.apache.org<mailto:user@ctakes.apache.org>
Subject: RE: MaxentParserWrapper

One follow-up.  Is the Constituency parser needed for good results with the assertion modules
(History, Generic, Uncertainty, etc.)?

From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
Sent: Monday, March 07, 2016 11:01 AM
To: user@ctakes.apache.org<mailto:user@ctakes.apache.org>
Subject: Re: MaxentParserWrapper

Hi Brandon,
I wrote the constituency parser module. It is basically a wrapper for the OpenNLP constituency
parser. The only thing our module does is convert from our TypeSystem into tokens for the
parser, run the parser, then convert the output back into our typesystem.

As far as slowness, it is known that there are issues with extremely long sentences (I believe
the algorithm is n^3 on the input so this makes sense). But we have found (Sean Finan pointed
this out) that the problem is often coming from upstream, with misclassified strings of punctuation
used as section delimiters being tokenized/segmented as super long sentences. I believe he
implemented some workarounds in some of our pipelines to recognize "non-real" sentences and
have the parser skip them, but I don't know off the top of my head where that is and whether
or not it's checked in.

Maybe Sean can chime in with more info if that sounds familiar.

Tim
On 03/07/2016 09:06 AM, Geise, Brandon D. wrote:
Hi,

Can someone point me in the direction of where I can dig deeper into the MaxentParserWrapper?
 I'm seeing some long slowness  once I get to this point in the pipeline and would like to
understand what's going on a little better.

Thanks,
Brandon

________________________________

IMPORTANT WARNING: The information in this message (and the documents attached to it, if any)
is confidential and may be legally privileged. It is intended solely for the addressee. Access
to this message by anyone else is unauthorized. If you are not the intended recipient, any
disclosure, copying, distribution or any action taken, or omitted to be taken, in reliance
on it is prohibited and may be unlawful. If you have received this message in error, please
delete all electronic copies of this message (and the documents attached to it, if any), destroy
any hard copies you may have created and notify me immediately by replying to this email.
Thank you. Geisinger Health System utilizes an encryption process to safeguard Protected Health
Information and other confidential data contained in external e-mail messages. If email is
encrypted, the recipient will receive an e-mail instructing them to sign on to the Geisinger
Health System Secure E-mail Message Center to retrieve the encrypted e-mail.




Mime
View raw message