pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject extracting embedded documents -- will getEmbeddedFile() alone miss embedded DOS/Unix/Mac files?
Date Wed, 23 Jul 2014 18:21:34 GMT
All,

  Over on Tika, it looks like we copied org.apache.pdfbox.examples.pdmodel.ExtractEmbeddedFiles
to extract embedded files.  As I look at the source code for PDComplexFileSpecification, I
notice that getEmbeddedFile() does not behave like getFilename(); that is, it doesn't iterate
through the various formats and return the first non null.

  When we try to get the PDEmbeddedFile, should we try each of these instead of just getEmbeddedFile()?



getEmbeddedFile()

getEmbeddedFileDos()

getEmbeddedFileUnix()

getEmbeddedFileMac()



  Will getEmbeddedFile() alone potentially miss embedded files?



   Thank you.



         Best,



                    Tim

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message