Return-Path: Delivered-To: apmail-incubator-pdfbox-dev-archive@locus.apache.org Received: (qmail 57046 invoked from network); 6 Jan 2009 22:00:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Jan 2009 22:00:05 -0000 Received: (qmail 51449 invoked by uid 500); 6 Jan 2009 22:00:05 -0000 Delivered-To: apmail-incubator-pdfbox-dev-archive@incubator.apache.org Received: (qmail 51423 invoked by uid 500); 6 Jan 2009 22:00:05 -0000 Mailing-List: contact pdfbox-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pdfbox-dev@incubator.apache.org Delivered-To: mailing list pdfbox-dev@incubator.apache.org Received: (qmail 51396 invoked by uid 99); 6 Jan 2009 22:00:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Jan 2009 14:00:05 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Jan 2009 22:00:04 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 7E4FF234C48D for ; Tue, 6 Jan 2009 13:59:44 -0800 (PST) Message-ID: <970766916.1231279184516.JavaMail.jira@brutus> Date: Tue, 6 Jan 2009 13:59:44 -0800 (PST) From: =?utf-8?Q?Andreas_Lehmk=C3=BChler_=28JIRA=29?= To: pdfbox-dev@incubator.apache.org Subject: [jira] Resolved: (PDFBOX-390) org.pdfbox.filter.ASCIIHexFilter does not skip Whitespace In-Reply-To: <1952833009.1227784604227.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/PDFBOX-390?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmk=C3=BChler resolved PDFBOX-390. --------------------------------------- Resolution: Fixed Fixed in version 732135 > org.pdfbox.filter.ASCIIHexFilter does not skip Whitespace > --------------------------------------------------------- > > Key: PDFBOX-390 > URL: https://issues.apache.org/jira/browse/PDFBOX-390 > Project: PDFBox > Issue Type: Bug > Components: Parsing > Affects Versions: 0.8.0-incubator > Reporter: Mathias Bosch > Fix For: 0.8.0-incubator > > Attachments: 000161.pdf, ASCIIHexFilter_390-Patch.diff > > > org.pdfbox.filter.ASCIIHexFilter does not skip Whitespace > According to the Specification (pdf_reference_1-7.pdf) all Whitespace > Characters between the ASCII-Hex values have to be skipped (see 3.3.1 > ASCIIHexDecode Filter). > The 0.8.0-incubator source decodes (or attempts to decode) those Whitespa= ce > Characters and as a result the byte values are wrong (all characters that > are not [0-9a-f] result in -1, but processing does continue). > This causes an invalid byte Stream. > The ASCIIHexDecode Filter Section also defines the EOD end Character of t= he > Byte Steam as '>' which might ease the parsing of inline Images. > (The EI Operator should follow the EOD in case of an inline Image). > Example for ASCII-Hex encoded value, copied from the Spec: > FF CE A3 7C 5B 3F 28 16 0A 02 00 02 0A 16 28 3F 5B 7C A3 CE FF > > I did fix the problem to be able to continue with my work. > I paste the changed code here as a hint that might help to fix the bug. > public class ASCIIHexFilter > implements Filter > { > /** > * Whitespace > * 0 0x00 Null (NUL) > * 9 0x09 Tab (HT) > * 10 0x0A Line feed (LF) > * 12 0x0C Form feed (FF) > * 13 0x0D Carriage return (CR) > * 32 0x20 Space (SP) =20 > */ > protected boolean isWhitespace(int c) { > return c =3D=3D 0 || c =3D=3D 9 || c =3D=3D 10 || c =3D=3D 12 || c = =3D=3D 13 || c =3D=3D 32; > } > =20 > protected boolean isEOD(int c) { > return (c =3D=3D 62); // '>' - EOD > } > /** > * {@inheritDoc} > */ > public void decode(InputStream compressedData, OutputStream result, COS= Dictionary options, int filterIndex) throws IOException { > int value =3D 0; > int firstByte =3D 0; > int secondByte =3D 0; > while ((firstByte =3D compressedData.read()) !=3D -1) { > =20 > // always after first char > while(isWhitespace(firstByte)) > firstByte =3D compressedData.read(); > if(isEOD(firstByte)) > break; > =20 > if(REVERSE_HEX[firstByte] =3D=3D -1) > System.out.println("Invalid Hex Code; int: " + firstByte + " char= : " + (char) firstByte); > value =3D REVERSE_HEX[firstByte] * 16; > secondByte =3D compressedData.read(); > =20 > if(isEOD(secondByte)) { > // second value behaves like 0 in case of EOD > result.write(value); > break; > } > if(secondByte >=3D 0) { > if(REVERSE_HEX[secondByte] =3D=3D -1) > System.out.println("Invalid Hex Code; int: " + secondByte + " c= har: " + (char) secondByte); > value +=3D REVERSE_HEX[secondByte]; > } > result.write(value); > } > =20 > result.flush(); > } > // ..................................................... > // other code remains unchanged --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.