commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Fortner <>
Subject Re: [vfs] File Metadata
Date Wed, 15 Feb 2006 18:49:43 GMT
Mario Ivankovits wrote:
> Hi!
>> The one problem I see with the service API is that if I'm trying to
>> find metadata for a FileObject, looking in the service API isn't an
>> obvious thing to do.
> But it is the most powerful solution.
> I dont want to change/extend the interface for every single thing we can
> imagine in the future.
> And one should be able to add commands by simply dropping in a jar.
I agree that we shouldn't have individual accessors/mutators for 
everything that you might want to get
out of a file (i.e. getAuthor, getCreationDate).

What if we had something like this:

    + static getMetadata(FileObject file):Map
    + static getKeys(FileObject file):Set
| <<uses>>

    + static getInstanceByMimeType(String mimetype):MetadataReader
    + static getInstanceByExtension(String ext):MetadataReader
    + static getInstance(FileObject obj):MetadataReader

| <<creates>>

    + getMetadata(): Map<String, String>
    + getMetadataKeys():Set<String> -- allows you to see what metadata 
is available
    + getMimetypes():List<String>

|  <<implements>>

ImageMetadataReader [org.apache.commons.vfs.metadata.image]
SoundMetadataReader [org.apache.commons.vfs.metadata.sound]
OpenOfficeMetadataReader [org.apache.commons.vfs.metadata.openoffice]
MicrosoftOfficeMetadataReader [org.apache.commons.vfs.metadata.poi]

Presumably one could also add writers for these metadata types using a 
similar set of classes and interfaces.

These classes could invoke services underneath the hood, but I think the 
metadata API should be high enough in the package structure, and have 
obvious enough names that people don't have to go hunting.  I've found 
that if users can't find something within 5-10 minutes they figure it's 
not there, and either give up on the API or write their own.  Neither of 
which we would want them to do.

>> Most people when they're starting to learn VFS are going to look for
>> some method in the FileObject (or if they're clever in the
>> FileContentInfo).  Either of these places are logical places to look
>> for metadata methods.
> But once they stepped into the service API it should be easily
> understandable, no?
> And as you say, it isnt that a new concept.

The trick is getting them to "step into" the service API to begin with 
-- it would require them to think of metadata as a service.  It's not 
something that naturally occurs to people to do and so they would 
probably never think to look in a services package for metadata code.  
It isn't a new concept; however, its implementation in JAF left a lot to 
be desired, and was difficult for a lot of people to understand.  This 
is the primary reason that it doesn't really get used a lot.  It's still 
gives me headaches when I look at the doc on it. :-)

An org.apache.commons.vfs.metadata package would be fairly obvious to 
most people.

>> Any ideas about how we could make it easier for them?
> Docs, Wiki, Mailinglist (in this order, I hope ;-) )
All of which are good. But most people only check them after they 
haven't been able to find it in the Javadocs under some intuitive 
package name. :-)

> Think about how powerful it could be, given the following three things
> share the same base class
>> Open Office metadata
>> Microsoft Office metadata
>> MP3/AAC/Ogg metadata
> e.g. DocumentInfo which provides something like (title, author, ...)
> one can simply lookup  DocumentInfo.class and get these informations. If
> one drop in a jar to extract these data from e.g. java files the code
> will use it in the second.
> I wont say it isnt possible to do this by extending the API, but I think
> it will bloat it.
Is the DocumentInfo some other interface you're thinking of?  If so, 
what's the difference between FileContentInfo and DocumentInfo?

I think most of the "code bloat" would be fairly small. Basically a 
single new package, and a single method in an interface that returns 
metadata for specific mimetypes. The actual implementations are simply 
adapters that implement the interface by making calls to existing APIs 
capable of reading file metadata.  In the case of Open Office, that's a 
fairly simple matter of looking at the meta.xml file inside the Open 
Office zip file.  For images, there are a couple different ways of 
getting at this data (either through Drew Noakes' metadata-extractor API 
(, or through  JAI 
( and finally POI 
can extract Microsoft Office document metadata.

Are you anticipating that you'll have some sort of "service discovery " 
mechanism that will automatically register all services found in the 
classpath and make them available?  If so, then this too would require 
some work to make it easy for users to use.  There would need to be some 
mechanism for the user to install supporting JARs needed for specific 
metadata service providers.

I believe that most of what I've outlined though, is so standard and 
generic that it should be part of the standard VFS distribution rather 
than available through additional downloads.
I think usually people want and expect everything in a single download, 
rather than having to make choices about which service providers they 
want.  The existing file system service providers are a good example of 
this.  Right now you have to explicitly download and install additional 
jars to get the some of  functionality that you want.  It would be 
easier, if everything you needed to get started were available in a 
single download, or with a single Ant "install" target.

Hope this clarifies things a bit. Sorry for the ASCII UML diagram. :-)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message