pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oliver Mannion (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (PDFBOX-4521) Missing Info value from file trailer: org.apache.pdfbox.cos.COSName cannot be cast to org.apache.pdfbox.cos.COSDictionary
Date Sat, 20 Apr 2019 11:05:00 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-4521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822410#comment-16822410
] 

Oliver Mannion edited comment on PDFBOX-4521 at 4/20/19 11:04 AM:
------------------------------------------------------------------

[^Editathon_cheat_sheet_(EN)_MetaDefender.pdf] is an example PDF that generated the exception
and has the following file trailer:

{code:java}
<</Type/XRef/Filter/FlateDecode/DecodeParms <</Predictor 12/Columns 6>>/W
[1 3 2]/Length 116/Size 65/Root 5 0 R/Info /ID >>{code}
It was generated by passing [^Editathon_cheat_sheet_(EN).pdf] through MetaDefender 4.14.3.
The original file trailer is missing the {{/Info}} section altogether but is happily parsed
by PDFBox:

{code:java}
 << /Type /XRef /Length 17 /Filter /FlateDecode /DecodeParms << /Columns 5 /Predictor
12 >> /W [ 1 3 1 ] /Size 2 /ID                                   
                [<acbd8f83222179fe43e942f9f5e3128a><acbd8f83222179fe43e942f9f5e3128a>]
>>{code}
 


was (Author: oliman):
[^Editathon_cheat_sheet_(EN)_MetaDefender.pdf] is an example PDF that generated the exception
and has the following file trailer:

{code:java}
<</Type/XRef/Filter/FlateDecode/DecodeParms <</Predictor 12/Columns 6>>/W
[1 3 2]/Length 116/Size 65/Root 5 0 R/Info /ID >>{code}
It was generated by passing [^Editathon_cheat_sheet_(EN).pdf] through MetaDefender 4.14.3.
The original file trailer is missing the {{/Info}} section altogether:

{code:java}
 << /Type /XRef /Length 17 /Filter /FlateDecode /DecodeParms << /Columns 5 /Predictor
12 >> /W [ 1 3 1 ] /Size 2 /ID                                   
                [<acbd8f83222179fe43e942f9f5e3128a><acbd8f83222179fe43e942f9f5e3128a>]
>>{code}
 

> Missing Info value from file trailer: org.apache.pdfbox.cos.COSName cannot be cast to
org.apache.pdfbox.cos.COSDictionary
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-4521
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4521
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.15
>            Reporter: Oliver Mannion
>            Priority: Major
>         Attachments: Editathon_cheat_sheet_(EN).pdf, Editathon_cheat_sheet_(EN)_MetaDefender.pdf
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The following exception
> {code:java}
> Cause: java.lang.ClassCastException: org.apache.pdfbox.cos.COSName cannot be cast to
org.apache.pdfbox.cos.COSDictionary at org.apache.pdfbox.pdmodel.PDDocument.getDocumentInformation(PDDocument.java:740)
at org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:242) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:154)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135){code}
> is generated by PDF documents that have no value in the file trailer for the {{Info}}
key, eg:
> {code:java}
> << /Size 50/Root 8 0 R/Info /ID >>
> {code}
> According to the [PDF spec|http://wwwimages.adobe.com/www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/pdf_reference_1-7.pdf] the
{{Info}} key is optional. PDFBox correctly handles the case when there is no {{Info}} key
and no value is present, but in this case, the key is present but without a value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Mime
View raw message