pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thad Humphries <thad.humphr...@gmail.com>
Subject Re: ExtractMetadata error
Date Fri, 10 Mar 2017 20:14:02 GMT
Figured it out, and fixed it. The problem was not with xmpbox, but with the
fact that project build was bringing in JDOM and Jaxen as part of another
package. Moving Jaxen from the parent project to the package where its
needed took it out of my image and PDF handling package, and fixed the
exception.

Bonus! Removing Jaxen also fixed an error I was experiencing with JPEG
metadata extraction using Drew Noakes' metadata-extractor (
https://github.com/drewnoakes/metadata-extractor).

Eventually all these jars will have to live in a webapp in the same
WEB-INF/lib directory, so I may not be out of the woods yet, but at least I
know where the problem is coming from.

On Thu, Mar 9, 2017 at 1:33 PM, Thad Humphries <thad.humphries@gmail.com>
wrote:

> Yes, I can take a stab at that in a few days, after the crunch of my
> current project abates. I'll let you know when it's on GitHub. Thanks.
>
> On Thu, Mar 9, 2017 at 12:43 PM, Tilman Hausherr <THausherr@t-online.de>
> wrote:
>
>> Can you create a minimal but fully working project with maven? I.e. we'd
>> need code with main, and a pom. I mention this because an additional lib is
>> needed, unless I misunderstood.
>>
>> Tilman
>>
>>
>> Am 09.03.2017 um 16:51 schrieb Thad Humphries:
>>
>>> Here's my code. As I said, it is throwing an exception at "new
>>> DomXmpParser()" and I have no idea why:
>>>
>>>    protected JSONObject getPdfMetadata(byte [] buffer)
>>>        throws IOException, XmpParsingException, JSONException {
>>>      ByteArrayInputStream bais = new ByteArrayInputStream(buffer);
>>>
>>>      JSONObject json = new JSONObject();
>>>      PDDocument document = null;
>>>      try {
>>>        document = PDDocument.load(bais);
>>>        PDDocumentCatalog catalog = document.getDocumentCatalog();
>>>        PDMetadata meta = catalog.getMetadata();
>>>
>>>        if (meta != null) {
>>>          DomXmpParser xmpParser = new DomXmpParser();  // throws
>>> exception
>>>          XMPMetadata metadata = xmpParser.parse(meta.createInp
>>> utStream());
>>>
>>>          DublinCoreSchema dc = metadata.getDublinCoreSchema();
>>>          if (dc != null) {
>>>            JSONObject dcj = new JSONObject();
>>>            dcj.put("Title", dc.getTitle());
>>>            dcj.put("Description", dc.getDescription());
>>>            ...
>>>            json.put("Dublin", dcj);
>>>          }
>>>    ...
>>>
>>> My goal is to return a JSON formatted string to a browser, and display
>>> the
>>> fomatted metadata to the user. So for now I'm getting around this
>>> DomXmpParser exception from DomXmpParser by simply converting the
>>> metadata
>>> to JSON with JSON-java (https://github.com/stleary/JSON-java), and
>>> untangling the namespace, etc. on browser side:
>>>
>>>      PDMetadata meta = catalog.getMetadata();
>>>
>>>        if (meta != null) {
>>>          InputStream is = meta.exportXMPMetadata();
>>>          ByteArrayOutputStream baos = new ByteArrayOutputStream();
>>>          int read = 0;
>>>          byte [] bytes = new byte[8*1024];
>>>          while ((read = is.read(bytes)) != -1) {
>>>            baos.write(bytes, 0, read);
>>>          }
>>>          String string = new String(baos.toByteArray());
>>>          json = XML.toJSONObject(string);
>>>      ...
>>>
>>>
>>> On Wed, Mar 8, 2017 at 10:11 PM, Thad Humphries <
>>> thad.humphries@gmail.com>
>>> wrote:
>>>
>>> When I run the org.apache.pdfbox.examples.pdmodel.ExtractMetadata
>>>> example, it works. However when I put the same code into my class, it
>>>> throws an exception when I call "DomXmpParser xmpParser = new
>>>> DomXmpParser();"  The trace is:
>>>>
>>>> java.lang.AbstractMethodError: javax.xml.parsers.DocumentBuil
>>>> derFactory.
>>>> setFeature(Ljava/lang/String;Z)V
>>>> at org.apache.xmpbox.xml.DomXmpParser.<init>(DomXmpParser.java:81)
>>>> at com.jthad.util.image.MetadataExtractor.getPdfMetadata(
>>>> MetadataExtractor.java:170)
>>>> at com. jthad.util.image.TestMetadataExtractor.testPdf0(
>>>> TestMetadataExtractor.java:41)
>>>> ...
>>>>
>>>> Line 81 in DomXmpParser.java is
>>>>
>>>> dbFactory.setFeature("http://apache.org/xml/features/disallo
>>>> w-doctype-decl",
>>>> true);
>>>>
>>>> I am at a loss to understand how "new DomXmpParser()" works from the
>>>> command line but fails when called by a JUnit test in Eclipse.
>>>> ...
>>>>
>>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
> --
> "Hell hath no limits, nor is circumscrib'd In one self-place; but where we
> are is hell, And where hell is, there must we ever be" --Christopher
> Marlowe, *Doctor Faustus* (v. 121-24)
>



-- 
"Hell hath no limits, nor is circumscrib'd In one self-place; but where we
are is hell, And where hell is, there must we ever be" --Christopher
Marlowe, *Doctor Faustus* (v. 121-24)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message