commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark <elihusma...@gmail.com>
Subject Re: JMimeMagic (was [fileUpload] file content-type)
Date Wed, 19 Apr 2006 11:09:25 GMT
great feedback guys.  I think I will start writing a new project from
scratch.  How can I go about getting set up with jakarta commons.  Do
I need to 'apply' to start a new project?

Thanks again!!

On 4/19/06, Jörg Schaible <Joerg.Schaible@elsag-solutions.com> wrote:
> Hi Markus,
>
> Jörg Schaible wrote on Wednesday, April 19, 2006 8:46 AM:
>
> > Hi Markus,
> >
> > Markus Härnvi wrote on Wednesday, April 19, 2006 8:47 AM:
> >
> >> Hi!
> >>
> >>> Starting from scratch would be possibly the best anyway. I
> >> had it also on my todo list on a very low priority ... but
> >> just, because I found that jMimeMagic has a really worse
> >> implemenattion - extremly slow and not working correctly. I
> >> have a good pile of image files it does not detect. Main
> >> reason is, that the implementation is simply wrong. The
> >> original magic files have a clear idea of precedence of
> >> patterns - this has been lost completely in the
> >> conversion/implementation of jMimeMagic.
> >>>
> >>> - Jörg
> >>>
> >>
> >> Using the original magic file and parse it in Java also makes it
> >> easier to keep it updated. Just add the newest magic file to the jar
> >> file and we are done.
> >
> > That would have been my approach also. I was just not sure,
> > whether we should bundle the magic file or try to locate it
> > (this is the interesting part and highly system dependent).
> > And a user might have an additional magic file in its home -
> > at least this can be located.
>
> After looking into the magic files (magic and magic.mime) I am somewhat disappointed.
While file magic is good at binary formats with fixed headers, its definition language is
poor for string based formats, e.g. rules for detecting XML & XSL:
>
> ===== %< =====
> 0       string/cb       \<?xml                  XML document text
> 0       string          \<?xml\ version "       XML
> 0       string          \<?xml\ version="       XML
> >15     string          >\0                     %.3s document text
> >>23    string          \<xsl:stylesheet        (XSL stylesheet)
> >>24    string          \<xsl:stylesheet        (XSL stylesheet)
> 0       string/b        \<?xml                  XML document text
> 0       string/cb       \<?xml                  broken XML document text
> ===== %< =====
>
> This is quite poor. The second line is invalid XML. It looks at offset 23 or 24 for "<xsl:stylesheet"
totally ignoring the fact, that the offset might be quite different if the XML declaration
contains an encoding attribute or depending on the whitspaces and line ending. See detection
of xml mime formats:
>
> ===== %< =====
> 0       string          \<?xml
> >38     string          \<\!DOCTYPE\040svg      image/svg+xml
> 0       string          \<?xml                  text/xml
> ===== %< =====
>
> Again I am quite sure, that a lot of SVG documents are not recognized.
>
> Main problem is that the format specification cannot deal with variable length. See "man
magic" for the format definition. You cannot express, that a file with an XML declaration
followed by a non-empty line with a DOCTYPE declaration for SVG is "image/svg+xml".
>
> Bottom line: I am no longer sure, if a mime detection based on the definitions of file
magic is really a good idea :-/
>
> - Jörg
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-user-help@jakarta.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Mime
View raw message