commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörg Schaible <Joerg.Schai...@Elsag-Solutions.com>
Subject OT: JMimeMagic (was [fileUpload] file content-type)
Date Tue, 30 May 2006 07:10:58 GMT
Hi David,

I realize that we're getting off-topic for jakarta commons here ... ;-)

David Castro wrote on Tuesday, May 30, 2006 7:36 AM:

> Jörg Schaible wrote:
[snip]
>> After a quick look over the package you get the impression,
>> that you imported the magic codes of file magic into the
>> project. And then you're quite astonished, if the library
>> does not detect simple formats (e.g. TIFF, Windows BMP), that
>> are no problem for the C pendant. This is IMHO a problem,
>> I did use the "magic" file to assist in generating the magic.xml file
>> bunded with the project.  You'll note that I have some, but
>> not all of
>> the matches cleaned up and working.  Actually, the file command will
>> sometimes have incorrect matches itself, which I didn't want to
>> inherit.  So, I started with a small set of documents that I generated
>> and ran them through unit tests to verify them.
> 
> I would never be astonished that an alpha piece of open
> source software
> doesn't work exactly as expected or is limited in it's out-of-box
> state.  I only moonlight as a open source developer as much
> as I'd like
> it to be my full-time job ;)

I assume most people here are in the same boat including myself.

>> because there's simply no documentation, that states
>> something else. When I detected jMimeMagic I just thought to
>> use it as a black box.
>
> Yeah, if you are looking for something that doesn't require a bit of
> elbow grease, jMimeMagic wouldn't be an optimal solution since it is
> early alpha open source software.  That's pretty normal I think.

And it's pretty normal for users to expect the opposite :D

> Nothing else existed out there when I started this project and I only
> had so many hours to devote to it.  But let's get the engine
> revved up
> and make it more out-of-the-box-friendly.

:)

[snip]

>> But you could not decide, what you wanted to implement.
>> See, file magic has two magic files, one to produce a format
>> description and one for the mime type. Your implementation
>> mixes the two approaches.
>
> I decided exactly what I wanted to implement and what I wanted to
> prepare for (at least at the time).  You're assuming that my intention
> was to simply duplicate the "file" utility, which isn't the case.
> Determining mime type was really only one of my intentions.

Well, by naming you project j*Mime*Magic, you imply something ;-)

> More import
> to me was actually determining the specific type and state of
> content in
> a stream of data.  It was initially built as a helper library for a
> malware detection project.
>
>> Mime type detection is normally an action that should
>> happen *fast*, but if I request the mime type for an MP3 you
>> evaluate all the nested matchers that are totally moot for
>> the mime type.
> 
> Now you are talking about optimization based on one of the
> specific uses
> of the library. 

No, I am talking about your attempt to target two different things at the same time and you
cannot do both of it efficiently.

> I agree with you that there are some things
> that can
> certainly be done better/more efficiently.  Those need to be
> identified and patched, but let's try not to throw the baby out with
> the 
> bath water.

Split the result of the parser, create specialized matchers for mime type detection and descriptive
format detection. If you have the need to detect a mime type it is typically something you
wanna do on the fly - and fast.

>> Looking at the code:
>> 
[snip]
>> - you're code is linked to Log4J. This is not good for
>> libraries. See, some of our customers use completely own
>> logging implementations, but with commons-logging you can at
>> least write an easy bridge
>> 
> Yup, I agree with you.  Nobody has been pounding on the door
> asking for
> it and I had enough work on other projects to not concern myself too
> deeply with it.

Demand, demand :)

>> - you never guard log.debug with log.isDebug - and you create *a
>> lot* of debug output 
> 
> Yup, certainly and area for making the library more
> efficient.  Again,
> completely aware of the issue...just haven't fixed it yet.
>
>> - file magic has also its limits as already explained in
>> this thread. You already introduced regexp support, but you
>> don't use it properly e.g. for the HTML types so far
> 
> Definitely limits, and as I mentioned I was already moving and have
> already coded adjustments to support more of a pluggable matcher
> architecture. 

This is the functionality, that *I* am not that interested in ... the mime type can normally
be detected quite easily with the standard patterns.

> And if my HTML regex matcher is
> broken...

Well, you have some of those non-regex, fixed position HTML matching definitions in your magic.xml,
that are also present in file magic's definitons and that don't work too well.

> please send me a
> patch =)  I've been calling for folks to help build out a
> complete set
> of matchers for more content types, but with limited responses.

Just to clarify, when I first looked at jMimeMagic, it was just some days before you posted
your call for help. So the project looked to me like a lot of other abandonned projects on
SF with a single time dump of some experimental code. Therefore I wanna apologize for my overall
bad reputation I gave to your project in one of my first postings in this thread.

> I usually just scratch my own itches.  I've also determined
> that I am a
> pretty lousy mind reader ;)

:)

>> OK, some of the problems would have been solved by
>> providing an own magic.xml file. E.g. one of my mistakes with
>> the library was, that I assumed that the magic file was read
>> every time you create a Magic instance and you would have to
>> synchronize the initializartion of the instance if you want
>> to share it. This assumtion was wrong, but only after looking
>> at the code - not by reading the javadocs.
> 
> Yeah...documentation is the first to go =(  I try to keep my projects
> clean, organized, and as simple as possible though.  So if
> you browse,
> you should get a good feel for what is going on.  It's not always
> beautiful or elegant, but you shouldn't find any obfuscated code...heh
> 
> Thanks for the feedback.  I understand it is aways
> frustrating working
> with somebody else's code, so I'm sure it was less fun for
> you to deal
> with jMimeMagic than it typically is for myself.  But let's make it
> better.  I'd love to have other folks to collaborate with on this.

As you have seen from all the folks responding to this thread, there is a need for it and
people are willing to do something. There's no need to bring it here to Jakarta Commons though,
SF is totally fine.

- Jörg

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Mime
View raw message