ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geise, Brandon D." <bdge...@geisinger.edu>
Subject RE: MaxentParserWrapper
Date Mon, 07 Mar 2016 19:55:48 GMT
One follow-up.  Is the Constituency parser needed for good results with the assertion modules
(History, Generic, Uncertainty, etc.)?

From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
Sent: Monday, March 07, 2016 11:01 AM
To: user@ctakes.apache.org
Subject: Re: MaxentParserWrapper

Hi Brandon,
I wrote the constituency parser module. It is basically a wrapper for the OpenNLP constituency
parser. The only thing our module does is convert from our TypeSystem into tokens for the
parser, run the parser, then convert the output back into our typesystem.

As far as slowness, it is known that there are issues with extremely long sentences (I believe
the algorithm is n^3 on the input so this makes sense). But we have found (Sean Finan pointed
this out) that the problem is often coming from upstream, with misclassified strings of punctuation
used as section delimiters being tokenized/segmented as super long sentences. I believe he
implemented some workarounds in some of our pipelines to recognize "non-real" sentences and
have the parser skip them, but I don't know off the top of my head where that is and whether
or not it's checked in.

Maybe Sean can chime in with more info if that sounds familiar.


On 03/07/2016 09:06 AM, Geise, Brandon D. wrote:

Can someone point me in the direction of where I can dig deeper into the MaxentParserWrapper?
 I'm seeing some long slowness  once I get to this point in the pipeline and would like to
understand what's going on a little better.



IMPORTANT WARNING: The information in this message (and the documents attached to it, if any)
is confidential and may be legally privileged. It is intended solely for the addressee. Access
to this message by anyone else is unauthorized. If you are not the intended recipient, any
disclosure, copying, distribution or any action taken, or omitted to be taken, in reliance
on it is prohibited and may be unlawful. If you have received this message in error, please
delete all electronic copies of this message (and the documents attached to it, if any), destroy
any hard copies you may have created and notify me immediately by replying to this email.
Thank you. Geisinger Health System utilizes an encryption process to safeguard Protected Health
Information and other confidential data contained in external e-mail messages. If email is
encrypted, the recipient will receive an e-mail instructing them to sign on to the Geisinger
Health System Secure E-mail Message Center to retrieve the encrypted e-mail.

View raw message