pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thad Humphries <thad.humphr...@gmail.com>
Subject Re: ExtractMetadata error
Date Thu, 09 Mar 2017 15:51:50 GMT
Here's my code. As I said, it is throwing an exception at "new
DomXmpParser()" and I have no idea why:

  protected JSONObject getPdfMetadata(byte [] buffer)
      throws IOException, XmpParsingException, JSONException {
    ByteArrayInputStream bais = new ByteArrayInputStream(buffer);

    JSONObject json = new JSONObject();
    PDDocument document = null;
    try {
      document = PDDocument.load(bais);
      PDDocumentCatalog catalog = document.getDocumentCatalog();
      PDMetadata meta = catalog.getMetadata();

      if (meta != null) {
        DomXmpParser xmpParser = new DomXmpParser();  // throws exception
        XMPMetadata metadata = xmpParser.parse(meta.createInputStream());

        DublinCoreSchema dc = metadata.getDublinCoreSchema();
        if (dc != null) {
          JSONObject dcj = new JSONObject();
          dcj.put("Title", dc.getTitle());
          dcj.put("Description", dc.getDescription());
          ...
          json.put("Dublin", dcj);
        }
  ...

My goal is to return a JSON formatted string to a browser, and display the
fomatted metadata to the user. So for now I'm getting around this
DomXmpParser exception from DomXmpParser by simply converting the metadata
to JSON with JSON-java (https://github.com/stleary/JSON-java), and
untangling the namespace, etc. on browser side:

    PDMetadata meta = catalog.getMetadata();

      if (meta != null) {
        InputStream is = meta.exportXMPMetadata();
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        int read = 0;
        byte [] bytes = new byte[8*1024];
        while ((read = is.read(bytes)) != -1) {
          baos.write(bytes, 0, read);
        }
        String string = new String(baos.toByteArray());
        json = XML.toJSONObject(string);
    ...


On Wed, Mar 8, 2017 at 10:11 PM, Thad Humphries <thad.humphries@gmail.com>
wrote:

> When I run the org.apache.pdfbox.examples.pdmodel.ExtractMetadata
> example, it works. However when I put the same code into my class, it
> throws an exception when I call "DomXmpParser xmpParser = new
> DomXmpParser();"  The trace is:
>
> java.lang.AbstractMethodError: javax.xml.parsers.DocumentBuilderFactory.
> setFeature(Ljava/lang/String;Z)V
> at org.apache.xmpbox.xml.DomXmpParser.<init>(DomXmpParser.java:81)
> at com.jthad.util.image.MetadataExtractor.getPdfMetadata(
> MetadataExtractor.java:170)
> at com. jthad.util.image.TestMetadataExtractor.testPdf0(
> TestMetadataExtractor.java:41)
> ...
>
> Line 81 in DomXmpParser.java is
>
> dbFactory.setFeature("http://apache.org/xml/features/disallow-doctype-decl",
> true);
>
> I am at a loss to understand how "new DomXmpParser()" works from the
> command line but fails when called by a JUnit test in Eclipse.
>
> --
> "Hell hath no limits, nor is circumscrib'd In one self-place; but where we
> are is hell, And where hell is, there must we ever be" --Christopher
> Marlowe, *Doctor Faustus* (v. 121-24)
>



-- 
"Hell hath no limits, nor is circumscrib'd In one self-place; but where we
are is hell, And where hell is, there must we ever be" --Christopher
Marlowe, *Doctor Faustus* (v. 121-24)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message