pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: ExtractMetadata error
Date Thu, 09 Mar 2017 17:43:25 GMT
Can you create a minimal but fully working project with maven? I.e. we'd 
need code with main, and a pom. I mention this because an additional lib 
is needed, unless I misunderstood.

Tilman

Am 09.03.2017 um 16:51 schrieb Thad Humphries:
> Here's my code. As I said, it is throwing an exception at "new
> DomXmpParser()" and I have no idea why:
>
>    protected JSONObject getPdfMetadata(byte [] buffer)
>        throws IOException, XmpParsingException, JSONException {
>      ByteArrayInputStream bais = new ByteArrayInputStream(buffer);
>
>      JSONObject json = new JSONObject();
>      PDDocument document = null;
>      try {
>        document = PDDocument.load(bais);
>        PDDocumentCatalog catalog = document.getDocumentCatalog();
>        PDMetadata meta = catalog.getMetadata();
>
>        if (meta != null) {
>          DomXmpParser xmpParser = new DomXmpParser();  // throws exception
>          XMPMetadata metadata = xmpParser.parse(meta.createInputStream());
>
>          DublinCoreSchema dc = metadata.getDublinCoreSchema();
>          if (dc != null) {
>            JSONObject dcj = new JSONObject();
>            dcj.put("Title", dc.getTitle());
>            dcj.put("Description", dc.getDescription());
>            ...
>            json.put("Dublin", dcj);
>          }
>    ...
>
> My goal is to return a JSON formatted string to a browser, and display the
> fomatted metadata to the user. So for now I'm getting around this
> DomXmpParser exception from DomXmpParser by simply converting the metadata
> to JSON with JSON-java (https://github.com/stleary/JSON-java), and
> untangling the namespace, etc. on browser side:
>
>      PDMetadata meta = catalog.getMetadata();
>
>        if (meta != null) {
>          InputStream is = meta.exportXMPMetadata();
>          ByteArrayOutputStream baos = new ByteArrayOutputStream();
>          int read = 0;
>          byte [] bytes = new byte[8*1024];
>          while ((read = is.read(bytes)) != -1) {
>            baos.write(bytes, 0, read);
>          }
>          String string = new String(baos.toByteArray());
>          json = XML.toJSONObject(string);
>      ...
>
>
> On Wed, Mar 8, 2017 at 10:11 PM, Thad Humphries <thad.humphries@gmail.com>
> wrote:
>
>> When I run the org.apache.pdfbox.examples.pdmodel.ExtractMetadata
>> example, it works. However when I put the same code into my class, it
>> throws an exception when I call "DomXmpParser xmpParser = new
>> DomXmpParser();"  The trace is:
>>
>> java.lang.AbstractMethodError: javax.xml.parsers.DocumentBuilderFactory.
>> setFeature(Ljava/lang/String;Z)V
>> at org.apache.xmpbox.xml.DomXmpParser.<init>(DomXmpParser.java:81)
>> at com.jthad.util.image.MetadataExtractor.getPdfMetadata(
>> MetadataExtractor.java:170)
>> at com. jthad.util.image.TestMetadataExtractor.testPdf0(
>> TestMetadataExtractor.java:41)
>> ...
>>
>> Line 81 in DomXmpParser.java is
>>
>> dbFactory.setFeature("http://apache.org/xml/features/disallow-doctype-decl",
>> true);
>>
>> I am at a loss to understand how "new DomXmpParser()" works from the
>> command line but fails when called by a JUnit test in Eclipse.
>>
>> --
>> "Hell hath no limits, nor is circumscrib'd In one self-place; but where we
>> are is hell, And where hell is, there must we ever be" --Christopher
>> Marlowe, *Doctor Faustus* (v. 121-24)
>>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message