commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörg Schaible <>
Subject RE: JMimeMagic (was [fileUpload] file content-type)
Date Wed, 19 Apr 2006 14:45:35 GMT
Andrea Spinelli wrote on Wednesday, April 19, 2006 4:05 PM:

> Jörg Schaible wrote:
>> 1) It is definitely not possible to built on jmimemagic code because
>> of licencing reasons 
> Yep, but Mark wants to start a new one - reusing ideas is not
> forbidden ;-) 
>> 2) Although jMimeMagic claims to use an imported magic file
> itself, its magic.xml misses a lot of formats (e.g. tiff),
> that are present in file magic since ages
> Mark and his friends [including me, as far as I have some spare time]
> could read the files magic and magic.mime and generate
> something similar
> to magic.xml (or better).

The problem with an automated process here is that
a) due to the limitations with variable lengths most magic bytes for the text formats have
to be revised
b) magic bytes in magic and mime.magic differ for same formats
c) a+b permits a repetition of the automated process

So it is just the question, if writing a generator is worth the time for a one-time-task.
Additionally it is the question whether the content of the two files should be merged at all,
e.g. for "image/gif" it does not matter, which of the two GIF formats is used. If we also
wanna support a more informational textual format description, the matching trees should be
separated internally even if we have a single configuration file.

>> 3) Debug it! The code was definately not designed for speed
> as one would expect from a utility that should do such
> examinations on the fly
> I agree - maybe you can produce a checklist of point not to be
> missed?    a. 

For the design keep in mind, that some formats don't have a header, they simply append info
(e.g. mp3 tags)

If you look at file magic you can see, that they use precompiled versions of the two files.
This might be an option too. Another approach would be to generate a lexer from the configuration.

General problem is the sorting of the matchers. A more general matcher may not globber a specialized
one. This problem increases for multiple configuration files.

An application typically deals with the same mime types all the time. A user should be able
to define which formats he wants to be looked for at all and in what sequence (may be a priority).

Support a callback/monitoring/listener mechanism that fires events if a matcher fails or succeeds.
This helps to optimize the sequence (maybe even on the fly).

This is just a summary from thinking loud though ...

- Jörg

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message