pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Keggenhoff (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PDFBOX-3646) Annotations parsed from XFDF containing ampersand characters are not properly imported
Date Wed, 17 Oct 2018 12:24:00 GMT

     [ https://issues.apache.org/jira/browse/PDFBOX-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kai Keggenhoff updated PDFBOX-3646:
-----------------------------------
    Description: 
Annotations containing "&" in their text are displayed incorrectly when parsed unmodified
from XFDF (the ampersands are encoded as "&amp;" there) and added to a PDF document.
 This occurs for both "text comment" and "text box" type annotations.
 However, if the XFDF is modified by replacing "&amp;" with "&amp;amp;" prior to parsing,
the imported annotations are then displayed correctly.

The attached code produces two pdf files. One is the PDF with the unmodified XFDF imported,
two the PDF with the modifed XFDF.

A XFDF containing both a text box and text comment annotation is embedded in the source and
attached as a separated file.

Update 23.03.2017 : This problem persists in 2.0.5 and we noticed the same corruption of merged
annotations occur, if the annotation text contains a "<" (encoded as "lt" entity)

Update 17.10.2018 : This corruption is caused by FDFAnnotation.richContentsToString. This
method reads "<" and "&" from the parsed values in the document and puts them as such
into the markup, but these characters must be replaced with their entities.

I'll add this substitution to my proposed bugfix of 4345, please refer to https://issues.apache.org/jira/projects/PDFBOX/issues/PDFBOX-4345

  was:
Annotations containing "&" in their text are displayed incorrectly when parsed  unmodified
from XFDF (the ampersands are encoded as "&amp;amp;" there) and added to a PDF document.
This occurs for both "text comment" and "text box" type annotations.
However, if the XFDF is modified by replacing "&amp;amp;" with "&amp;amp;amp;" prior
to parsing, the imported annotations are then displayed correctly.

The attached code produces two pdf files. One is the PDF with the unmodified XFDF imported,
two the PDF with the modifed XFDF.

A XFDF containing both a text box and text comment annotation is embedded in the source and
attached as a separated file.

Update 23.03.2017 : This problem persists in 2.0.5 and we noticed the same corruption of merged
annotations occur, if the annotation text contains a "<" (encoded as "lt" entity)


> Annotations parsed from XFDF containing ampersand characters are not properly imported
> --------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-3646
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3646
>             Project: PDFBox
>          Issue Type: Bug
>          Components: AcroForm, PDModel
>    Affects Versions: 2.0.3, 2.0.4, 2.0.5, 2.0.6
>         Environment: java 1.8.0_112
>            Reporter: Kai Keggenhoff
>            Priority: Major
>              Labels: xfdf
>         Attachments: MergeTest.java, output1.pdf, output2.pdf, sample.xfdf
>
>
> Annotations containing "&" in their text are displayed incorrectly when parsed unmodified
from XFDF (the ampersands are encoded as "&amp;" there) and added to a PDF document.
>  This occurs for both "text comment" and "text box" type annotations.
>  However, if the XFDF is modified by replacing "&amp;" with "&amp;amp;" prior
to parsing, the imported annotations are then displayed correctly.
> The attached code produces two pdf files. One is the PDF with the unmodified XFDF imported,
two the PDF with the modifed XFDF.
> A XFDF containing both a text box and text comment annotation is embedded in the source
and attached as a separated file.
> Update 23.03.2017 : This problem persists in 2.0.5 and we noticed the same corruption
of merged annotations occur, if the annotation text contains a "<" (encoded as "lt" entity)
> Update 17.10.2018 : This corruption is caused by FDFAnnotation.richContentsToString.
This method reads "<" and "&" from the parsed values in the document and puts them
as such into the markup, but these characters must be replaced with their entities.
> I'll add this substitution to my proposed bugfix of 4345, please refer to https://issues.apache.org/jira/projects/PDFBOX/issues/PDFBOX-4345



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Mime
View raw message