pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler (JIRA) <j...@apache.org>
Subject [jira] Resolved: (PDFBOX-390) org.pdfbox.filter.ASCIIHexFilter does not skip Whitespace
Date Tue, 06 Jan 2009 21:59:44 GMT

     [ https://issues.apache.org/jira/browse/PDFBOX-390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andreas Lehmkühler resolved PDFBOX-390.
---------------------------------------

    Resolution: Fixed

Fixed in version 732135

> org.pdfbox.filter.ASCIIHexFilter does not skip Whitespace
> ---------------------------------------------------------
>
>                 Key: PDFBOX-390
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-390
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 0.8.0-incubator
>            Reporter: Mathias Bosch
>             Fix For: 0.8.0-incubator
>
>         Attachments: 000161.pdf, ASCIIHexFilter_390-Patch.diff
>
>
> org.pdfbox.filter.ASCIIHexFilter does not skip Whitespace
> According to the Specification (pdf_reference_1-7.pdf) all Whitespace
> Characters between the ASCII-Hex values have to be skipped (see 3.3.1
> ASCIIHexDecode Filter).
> The 0.8.0-incubator source decodes (or attempts to decode) those Whitespace
> Characters and as a result the byte values are wrong (all characters that
> are not [0-9a-f] result in -1, but processing does continue).
> This causes an invalid byte Stream.
> The ASCIIHexDecode Filter Section also defines the EOD end Character of the
> Byte Steam as '>' which might ease the parsing of inline Images.
> (The EI Operator should follow the EOD in case of an inline Image).
> Example for ASCII-Hex encoded value, copied from the Spec:
> FF CE A3 7C 5B 3F 28 16 0A 02 00 02 0A 16 28 3F 5B 7C A3 CE FF >
> I did fix the problem to be able to continue with my work.
> I paste the changed code here as a hint that might help to fix the bug.
> public class ASCIIHexFilter
>   implements Filter
> {
>  /**
>   * Whitespace
>   *   0  0x00  Null (NUL)
>   *   9  0x09  Tab (HT)
>   *  10  0x0A  Line feed (LF)
>   *  12  0x0C  Form feed (FF)
>   *  13  0x0D  Carriage return (CR)
>   *  32  0x20  Space (SP)  
>   */
>   protected boolean isWhitespace(int c) {
>     return c == 0 || c == 9 || c == 10 || c == 12 || c == 13 || c == 32;
>   }
>   
>   protected boolean isEOD(int c) {
>     return (c == 62); // '>' - EOD
>   }
>   /**
>    * {@inheritDoc}
>    */
>   public void decode(InputStream compressedData, OutputStream result, COSDictionary options,
int filterIndex) throws IOException {
>     int value = 0;
>     int firstByte = 0;
>     int secondByte = 0;
>     while ((firstByte = compressedData.read()) != -1) {
>       
>       // always after first char
>       while(isWhitespace(firstByte))
>         firstByte = compressedData.read();
>       if(isEOD(firstByte))
>         break;
>       
>       if(REVERSE_HEX[firstByte] == -1)
>         System.out.println("Invalid Hex Code; int: " + firstByte + " char: " + (char)
firstByte);
>       value = REVERSE_HEX[firstByte] * 16;
>       secondByte = compressedData.read();
>       
>       if(isEOD(secondByte)) {
>         // second value behaves like 0 in case of EOD
>         result.write(value);
>         break;
>       }
>       if(secondByte >= 0) {
>         if(REVERSE_HEX[secondByte] == -1)
>           System.out.println("Invalid Hex Code; int: " + secondByte + " char: " + (char)
secondByte);
>         value += REVERSE_HEX[secondByte];
>       }
>       result.write(value);
>     }
>     
>     result.flush();
>   }
> // .....................................................
> // other code remains unchanged

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message