commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Castro <apudcas...@entwash.org>
Subject Re: OT: JMimeMagic (was [fileUpload] file content-type)
Date Tue, 30 May 2006 22:24:19 GMT
Jorg,

Thanks and no problem.  And yeah, I think the library can do both, just
not at the same time as you pointed out.  Send me some suggestions and
we'll see what we can get fixed/cleaned up.  I have a few things I need
to get ready for a new release anyways.

BTW, if LGPL is a problem for folks, just let me know.  I would consider
re-licensing it.

I'd also still consider moving the project to Jakarta Commons, so that I
can utilize the Apache services.

Cheers,
David

Jörg Schaible wrote:
> Hi David,
>
> I realize that we're getting off-topic for jakarta commons here ... ;-)
>
> David Castro wrote on Tuesday, May 30, 2006 7:36 AM:
>
>   
>> Jörg Schaible wrote:
>>     
> [snip]
>   
>>> After a quick look over the package you get the impression,
>>> that you imported the magic codes of file magic into the
>>> project. And then you're quite astonished, if the library
>>> does not detect simple formats (e.g. TIFF, Windows BMP), that
>>> are no problem for the C pendant. This is IMHO a problem,
>>> I did use the "magic" file to assist in generating the magic.xml file
>>> bunded with the project.  You'll note that I have some, but
>>> not all of
>>> the matches cleaned up and working.  Actually, the file command will
>>> sometimes have incorrect matches itself, which I didn't want to
>>> inherit.  So, I started with a small set of documents that I generated
>>> and ran them through unit tests to verify them.
>>>       
>> I would never be astonished that an alpha piece of open
>> source software
>> doesn't work exactly as expected or is limited in it's out-of-box
>> state.  I only moonlight as a open source developer as much
>> as I'd like
>> it to be my full-time job ;)
>>     
>
> I assume most people here are in the same boat including myself.
>
>   
>>> because there's simply no documentation, that states
>>> something else. When I detected jMimeMagic I just thought to
>>> use it as a black box.
>>>       
>> Yeah, if you are looking for something that doesn't require a bit of
>> elbow grease, jMimeMagic wouldn't be an optimal solution since it is
>> early alpha open source software.  That's pretty normal I think.
>>     
>
> And it's pretty normal for users to expect the opposite :D
>
>   
>> Nothing else existed out there when I started this project and I only
>> had so many hours to devote to it.  But let's get the engine
>> revved up
>> and make it more out-of-the-box-friendly.
>>     
>
> :)
>
> [snip]
>
>   
>>> But you could not decide, what you wanted to implement.
>>> See, file magic has two magic files, one to produce a format
>>> description and one for the mime type. Your implementation
>>> mixes the two approaches.
>>>       
>> I decided exactly what I wanted to implement and what I wanted to
>> prepare for (at least at the time).  You're assuming that my intention
>> was to simply duplicate the "file" utility, which isn't the case.
>> Determining mime type was really only one of my intentions.
>>     
>
> Well, by naming you project j*Mime*Magic, you imply something ;-)
>
>   
>> More import
>> to me was actually determining the specific type and state of
>> content in
>> a stream of data.  It was initially built as a helper library for a
>> malware detection project.
>>
>>     
>>> Mime type detection is normally an action that should
>>> happen *fast*, but if I request the mime type for an MP3 you
>>> evaluate all the nested matchers that are totally moot for
>>> the mime type.
>>>       
>> Now you are talking about optimization based on one of the
>> specific uses
>> of the library. 
>>     
>
> No, I am talking about your attempt to target two different things at the same time and
you cannot do both of it efficiently.
>
>   
>> I agree with you that there are some things
>> that can
>> certainly be done better/more efficiently.  Those need to be
>> identified and patched, but let's try not to throw the baby out with
>> the 
>> bath water.
>>     
>
> Split the result of the parser, create specialized matchers for mime type detection and
descriptive format detection. If you have the need to detect a mime type it is typically something
you wanna do on the fly - and fast.
>
>   
>>> Looking at the code:
>>>
>>>       
> [snip]
>   
>>> - you're code is linked to Log4J. This is not good for
>>> libraries. See, some of our customers use completely own
>>> logging implementations, but with commons-logging you can at
>>> least write an easy bridge
>>>
>>>       
>> Yup, I agree with you.  Nobody has been pounding on the door
>> asking for
>> it and I had enough work on other projects to not concern myself too
>> deeply with it.
>>     
>
> Demand, demand :)
>
>   
>>> - you never guard log.debug with log.isDebug - and you create *a
>>> lot* of debug output 
>>>       
>> Yup, certainly and area for making the library more
>> efficient.  Again,
>> completely aware of the issue...just haven't fixed it yet.
>>
>>     
>>> - file magic has also its limits as already explained in
>>> this thread. You already introduced regexp support, but you
>>> don't use it properly e.g. for the HTML types so far
>>>       
>> Definitely limits, and as I mentioned I was already moving and have
>> already coded adjustments to support more of a pluggable matcher
>> architecture. 
>>     
>
> This is the functionality, that *I* am not that interested in ... the mime type can normally
be detected quite easily with the standard patterns.
>
>   
>> And if my HTML regex matcher is
>> broken...
>>     
>
> Well, you have some of those non-regex, fixed position HTML matching definitions in your
magic.xml, that are also present in file magic's definitons and that don't work too well.
>
>   
>> please send me a
>> patch =)  I've been calling for folks to help build out a
>> complete set
>> of matchers for more content types, but with limited responses.
>>     
>
> Just to clarify, when I first looked at jMimeMagic, it was just some days before you
posted your call for help. So the project looked to me like a lot of other abandonned projects
on SF with a single time dump of some experimental code. Therefore I wanna apologize for my
overall bad reputation I gave to your project in one of my first postings in this thread.
>
>   
>> I usually just scratch my own itches.  I've also determined
>> that I am a
>> pretty lousy mind reader ;)
>>     
>
> :)
>
>   
>>> OK, some of the problems would have been solved by
>>> providing an own magic.xml file. E.g. one of my mistakes with
>>> the library was, that I assumed that the magic file was read
>>> every time you create a Magic instance and you would have to
>>> synchronize the initializartion of the instance if you want
>>> to share it. This assumtion was wrong, but only after looking
>>> at the code - not by reading the javadocs.
>>>       
>> Yeah...documentation is the first to go =(  I try to keep my projects
>> clean, organized, and as simple as possible though.  So if
>> you browse,
>> you should get a good feel for what is going on.  It's not always
>> beautiful or elegant, but you shouldn't find any obfuscated code...heh
>>
>> Thanks for the feedback.  I understand it is aways
>> frustrating working
>> with somebody else's code, so I'm sure it was less fun for
>> you to deal
>> with jMimeMagic than it typically is for myself.  But let's make it
>> better.  I'd love to have other folks to collaborate with on this.
>>     
>
> As you have seen from all the folks responding to this thread, there is a need for it
and people are willing to do something. There's no need to bring it here to Jakarta Commons
though, SF is totally fine.
>
> - Jörg
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-user-help@jakarta.apache.org
>
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Mime
View raw message