tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-961) No whitespace added if BoilerpipeContentHandler.setIncludeMarkup(true)
Date Mon, 13 Aug 2012 17:28:38 GMT

    [ https://issues.apache.org/jira/browse/TIKA-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433333#comment-13433333

Markus Jelsma commented on TIKA-961:


I'll see if i can provide a test but i'd ideally need an example unit test that i build upon.
Have any?

I confirm whitespace is being added in the case of an element embedded in the middle of a
text without surrounding whitespace. This is a problem that i could not solve and is also
a problem with the standard HTML parser iirc.
> No whitespace added if BoilerpipeContentHandler.setIncludeMarkup(true)
> ----------------------------------------------------------------------
>                 Key: TIKA-961
>                 URL: https://issues.apache.org/jira/browse/TIKA-961
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.2
>            Reporter: Markus Jelsma
>            Assignee: Ken Krugler
>             Fix For: 1.3
>         Attachments: TIKA-961-1.3-1.patch
> ignorableWhitespace is not properly added when using the BoilerpipeContentHandler and
if markus is included.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message