incubator-sanselan-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremias Maerki <>
Subject Re: Metadata use by Apache Java projects
Date Tue, 20 Nov 2007 08:20:35 GMT
On 20.11.2007 08:24:01 Philipp Koch wrote:
> >    Jeremias, it sounds like you considering a new project which can
> > translate data from many formats (read by a variety of projects) into
> > XMP.  That sounds great!
> hmm, i am not sure if (yet) another  new project should be set up for
> this since the tika project already offers all the "infrastructure" to
> read meta data from various formats. from my point of view, the tika
> project should offer some kind of "meta data to xmp" translator.

Philipp, I'm not talking about just reading metadata, but also writing
it. Sanselan supports creating new TIFF, JPEG etc. files. FOP creates
new PDF, SVG etc. files. These processes all need metadata. Tika is a
metadata extraction kit. I'm talking about something more general. If
the common metadata storage model, if we can agree on one, at the end
becomes a subproject/subproduct of Tika, I'm cool. But I'm not sure Tika
could cover all this translation functionality for all the projects
using metadata. That's something the individual document format
libraries will be much better at. Tika is more of an aggregator.

> >    Sanselan could not use XMP internally to represent metadata,
> > though.  Sanselan's goal is to read & write metadata (such as EXIF
> > metadata) preserving not just tag values but directory structure,
> > field order, field location, etc.
> this makes sense to me, since i have only seen embedded xmp in adobe's
> products that are using the pdf "file format" to store its data
> (acrobat and illustrator at least)

Sure, the adoption of XMP is somewhat limited. But I've worked with it
for some time now and I've experienced the benefit. Our adopting it
could actually improve acceptance elsewhere.


Jeremias Maerki

View raw message