From dev-return-58930-archive-asf-public=cust-asf.ponee.io@pdfbox.apache.org Wed Oct 17 14:24:04 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id BA5FD18061A for ; Wed, 17 Oct 2018 14:24:03 +0200 (CEST) Received: (qmail 98231 invoked by uid 500); 17 Oct 2018 12:24:02 -0000 Mailing-List: contact dev-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pdfbox.apache.org Delivered-To: mailing list dev@pdfbox.apache.org Received: (qmail 98220 invoked by uid 99); 17 Oct 2018 12:24:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Oct 2018 12:24:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 72E2FC0413 for ; Wed, 17 Oct 2018 12:24:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.501 X-Spam-Level: X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 7hjXKP_qA671 for ; Wed, 17 Oct 2018 12:24:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 32DC35F3F2 for ; Wed, 17 Oct 2018 12:24:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id A36B9E25AA for ; Wed, 17 Oct 2018 12:24:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 4FE3421237 for ; Wed, 17 Oct 2018 12:24:00 +0000 (UTC) Date: Wed, 17 Oct 2018 12:24:00 +0000 (UTC) From: "Kai Keggenhoff (JIRA)" To: dev@pdfbox.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (PDFBOX-3646) Annotations parsed from XFDF containing ampersand characters are not properly imported MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/PDFBOX-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Keggenhoff updated PDFBOX-3646: ----------------------------------- Description: Annotations containing "&" in their text are displayed incorrectly when parsed unmodified from XFDF (the ampersands are encoded as "&" there) and added to a PDF document. This occurs for both "text comment" and "text box" type annotations. However, if the XFDF is modified by replacing "&" with "&amp;" prior to parsing, the imported annotations are then displayed correctly. The attached code produces two pdf files. One is the PDF with the unmodified XFDF imported, two the PDF with the modifed XFDF. A XFDF containing both a text box and text comment annotation is embedded in the source and attached as a separated file. Update 23.03.2017 : This problem persists in 2.0.5 and we noticed the same corruption of merged annotations occur, if the annotation text contains a "<" (encoded as "lt" entity) Update 17.10.2018 : This corruption is caused by FDFAnnotation.richContentsToString. This method reads "<" and "&" from the parsed values in the document and puts them as such into the markup, but these characters must be replaced with their entities. I'll add this substitution to my proposed bugfix of 4345, please refer to https://issues.apache.org/jira/projects/PDFBOX/issues/PDFBOX-4345 was: Annotations containing "&" in their text are displayed incorrectly when parsed unmodified from XFDF (the ampersands are encoded as "&amp;" there) and added to a PDF document. This occurs for both "text comment" and "text box" type annotations. However, if the XFDF is modified by replacing "&amp;" with "&amp;amp;" prior to parsing, the imported annotations are then displayed correctly. The attached code produces two pdf files. One is the PDF with the unmodified XFDF imported, two the PDF with the modifed XFDF. A XFDF containing both a text box and text comment annotation is embedded in the source and attached as a separated file. Update 23.03.2017 : This problem persists in 2.0.5 and we noticed the same corruption of merged annotations occur, if the annotation text contains a "<" (encoded as "lt" entity) > Annotations parsed from XFDF containing ampersand characters are not properly imported > -------------------------------------------------------------------------------------- > > Key: PDFBOX-3646 > URL: https://issues.apache.org/jira/browse/PDFBOX-3646 > Project: PDFBox > Issue Type: Bug > Components: AcroForm, PDModel > Affects Versions: 2.0.3, 2.0.4, 2.0.5, 2.0.6 > Environment: java 1.8.0_112 > Reporter: Kai Keggenhoff > Priority: Major > Labels: xfdf > Attachments: MergeTest.java, output1.pdf, output2.pdf, sample.xfdf > > > Annotations containing "&" in their text are displayed incorrectly when parsed unmodified from XFDF (the ampersands are encoded as "&" there) and added to a PDF document. > This occurs for both "text comment" and "text box" type annotations. > However, if the XFDF is modified by replacing "&" with "&amp;" prior to parsing, the imported annotations are then displayed correctly. > The attached code produces two pdf files. One is the PDF with the unmodified XFDF imported, two the PDF with the modifed XFDF. > A XFDF containing both a text box and text comment annotation is embedded in the source and attached as a separated file. > Update 23.03.2017 : This problem persists in 2.0.5 and we noticed the same corruption of merged annotations occur, if the annotation text contains a "<" (encoded as "lt" entity) > Update 17.10.2018 : This corruption is caused by FDFAnnotation.richContentsToString. This method reads "<" and "&" from the parsed values in the document and puts them as such into the markup, but these characters must be replaced with their entities. > I'll add this substitution to my proposed bugfix of 4345, please refer to https://issues.apache.org/jira/projects/PDFBOX/issues/PDFBOX-4345 -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For additional commands, e-mail: dev-help@pdfbox.apache.org