pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maruan Sahyoun <sahy...@fileaffairs.de>
Subject Re: DomXmpParser: namespace not found
Date Thu, 09 Jul 2015 19:32:18 GMT
Hi,

> Am 09.07.2015 um 18:13 schrieb Tilman Hausherr <THausherr@t-online.de>:
> 
> Am 09.07.2015 um 15:35 schrieb Allison, Timothy B.:
>> From my perspective, it would be great to have a general xmp parser that also allows
for some variance from spec (PDFBOX-2855).  We've been using jempbox for pdfs as well as images
over on Tika, and it has worked well for us.
>> 
>> I'd prefer to continue using your xmp parser, but I understand if you need to limit
what you're willing to take on.
>> 
>> I'll take a look at xmlgraphics, and I'll discuss the fallback option with Tika devs
about moving jempbox into Tika.
> 
> I had a quick look at xmlgraphics xmp, it would also required extra implementation.
> 
> I don't mind having it in xmpbox (we have some non-PDF stuff at other places too), we
just need a schema definition. Or the most complex possible file with that namespace. "All"
there is to do then is to add a file in org.apache.xmpbox.schema.

would it be possible to get the XMP files causing the exception so we have something to test
with?

BR
Maruan

> 
> Tilman
> 
>> 
>> Thank you.
>> 
>> Cheers,
>> 
>>                Tim
>> 
>> -----Original Message-----
>> From: Maruan Sahyoun [mailto:sahyoun@fileaffairs.de]
>> Sent: Thursday, July 09, 2015 4:56 AM
>> To: users@pdfbox.apache.org
>> Subject: Re: DomXmpParser: namespace not found
>> 
>> Hi,
>> 
>>> Am 08.07.2015 um 22:42 schrieb Tilman Hausherr <THausherr@t-online.de>:
>>> 
>>> Am 08.07.2015 um 17:22 schrieb Allison, Timothy B.:
>>>> All,
>>>> Apologies for the idiocy I'm about to reveal (well, that won't be a revelation
to anyone, really), but is there an obvious solution for this kind of error:
>>>> 
>>>> Caused by: org.apache.xmpbox.xml.XmpParsingException: Cannot find a definition
for the namespace http://ns.adobe.com/lightroom/1.0/
>>>>                 at org.apache.xmpbox.xml.DomXmpParser.checkPropertyDefinition(DomXmpParser.java:848)
>>>>                 at org.apache.xmpbox.xml.DomXmpParser.parseChildrenAsProperties(DomXmpParser.java:290)
>>>>                 at org.apache.xmpbox.xml.DomXmpParser.parseDescriptionRoot(DomXmpParser.java:234)
>>>>                 at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:198)
>>>>                 at org.apache.xmpbox.xml.DomXmpParser.parse(DomXmpParser.java:105)
>>>>                 at org.apache.tika.parser.image.xmp.JempboxExtractor.parse(JempboxExtractor.java:59)
>>>> 
>>>> On a handful of image files in our test docs on Tika, I'm getting this with:
>>>> 
>>>> http://ns.adobe.com/lightroom/1.0/
>>>> http://ns.adobe.com/exif/1.0/aux/
>>>> 
>>> These namespaces are not supported by xmpbox. We've had this problem with another
namespace (I can't remember which one), and it wasn't possible to support it because we couldn't
find a schema definition.
>>> 
>>> But you say these are image files. So this isn't about pdf xmp.
>> xmpbox is targeted around PDF/A-1. So I'd think we should discuss to extend it to
support other PDF standard meta data requirements as well as generic XMP use cases to again
have a generic XMP library. OTOH there is org.apache.xmlgraphics.xmp
>> 
>> WDYT?
>> 
>> BR
>> Maruan
>> 
>> 
>>> Tilman
>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message