incubator-ctakes-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Miller (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CTAKES-60) Null pointer error with empty sentences
Date Thu, 01 Nov 2012 17:39:12 GMT

    [ https://issues.apache.org/jira/browse/CTAKES-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488867#comment-13488867
] 

Tim Miller commented on CTAKES-60:
----------------------------------

This problem is a little bit more complex than I thought.  I think the problem is that the
dependency parser depends on the POS tagger, and thus expects every token to have a POS. 
However, the POS tagger for some reason does not tag single token sentences.  So in the example
above, the second sentence is just a period which will not get a POS tag in the dependency
pipeline.  Similarly, I also found this error with a sentence of the form "( This is a sentence.
)"  (i.e. a sentence surrounded with parens).  The close paren is taken as its own sentence,
but no POS tag on the token again.  

Here comes the twist: The default pipeline (aggregatePlaintextUMLSProcessor) gives this token
a POS tag!  So something weird is going on where a different component is assigning POS tags
to these tokens without them.  I've tracked it down to the Chunker. Since the dependency parser
pipeline does not use that component it gets these occasional errors.

A quick workaround is simply adding the chunker to the dependency parser pipeline, but that
will certainly not be intuitive for new users.  I think it is worth looking into why this
behavior happens in the chunker and seeing if it can be moved back to the POS tagger.
                
> Null pointer error with empty sentences
> ---------------------------------------
>
>                 Key: CTAKES-60
>                 URL: https://issues.apache.org/jira/browse/CTAKES-60
>             Project: cTAKES
>          Issue Type: Bug
>          Components: ctakes-chunker, ctakes-dependency-parser, ctakes-pos-tagger
>            Reporter: Tim Miller
>
> Null pointer exception in SRL module caused by certain ill-formed sentences (that other
components handle gracefully).
> Smallest workable example input:
> I encouraged exercise. She needs a vaccine still but we don't have any moer now. . She
will follow up with me in 4 months' time and also with her primary care physician.
> </example>
> The problem is something to do with the double period.  Running this example in the UIMA-CVD
with the AE located in "desc/analysis_engine/ClearParserSRLPlaintextAggregate.xml" produces
the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message