commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Castro <apudcas...@entwash.org>
Subject Re: JMimeMagic (was [fileUpload] file content-type)
Date Tue, 30 May 2006 05:36:18 GMT
Jörg Schaible wrote:
> Hi David,
>
> sorry for the delay, but I had to do some research again to give some more substantial
answers.
>
> David Castro wrote on Thursday, May 25, 2006 10:38 AM:
>
>   
>>> Hi Brain and Mark,
>>>
>>> Brian K. Wallace wrote on Tuesday, April 18, 2006 9:18 PM:
>>>
>>>
>>>       
>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>> Hash: SHA1
>>>>>
>>>>> Just be conscious of the fact that, with all open source projects,
>>>>> time is usually volunteer/as available/as the urge strikes. I
>>>>> wouldn't start to get anxious for a couple of weeks. (some take
>>>>> longer, but I'm anxious by then)
>>>>>
>>>>> As for forking -> commons, remember licensing issues. GPL/LGPL !=
>>>>> ASL. In order for ASL to come into the picture you'd have to not
>>>>> fork but start from scratch. IANAL, but that's how it's been
>>>>> presented before. 
>>>>>           
>>> Starting from scratch would be possibly the best anyway. I
>>>       
>> had it also on my todo list on a very low priority ... but
>> just, because I found that jMimeMagic has a really worse
>> implemenattion - extremly slow and not working correctly. I
>> have a good pile of image files it does not detect. Main
>> reason is, that the implementation is simply
>> What exactly is extremely slow and not working correctly?
>>     
>
> After a quick look over the package you get the impression, that you imported the magic
codes of file magic into the project. And then you're quite astonished, if the library does
not detect simple formats (e.g. TIFF, Windows BMP), that are no problem for the C pendant.
This is IMHO a problem, 
I did use the "magic" file to assist in generating the magic.xml file 
bunded with the project.  You'll note that I have some, but not all of 
the matches cleaned up and working.  Actually, the file command will 
sometimes have incorrect matches itself, which I didn't want to 
inherit.  So, I started with a small set of documents that I generated 
and ran them through unit tests to verify them.

I would never be astonished that an alpha piece of open source software 
doesn't work exactly as expected or is limited in it's out-of-box 
state.  I only moonlight as a open source developer as much as I'd like 
it to be my full-time job ;)
> because there's simply no documentation, that states something else. When I detected
jMimeMagic I just thought to use it as a black box.
>
>   
Yeah, if you are looking for something that doesn't require a bit of 
elbow grease, jMimeMagic wouldn't be an optimal solution since it is 
early alpha open source software.  That's pretty normal I think.  
Nothing else existed out there when I started this project and I only 
had so many hours to devote to it.  But let's get the engine revved up 
and make it more out-of-the-box-friendly.
>> There are lots of things that don't detect out of the box right now,
>> since only a subset of magic rules are defined in the magic.xml file.
>>     
>>>  wrong. The original magic files have a clear idea of
>>>       
>> precedence of patterns - this has been lost completely in the
>> conversion/implementation of jMimeMagic.
>>     
>> What is simply wrong about the implementation?  Precedence of matchers
>> is a part of the current implementation, so I'm not sure what
>> you mean.
>> jMimeMagic wasn't a conversion, it was an implementation written from
>> scratch. 
>>     
>
> But you could not decide, what you wanted to implement. See, file magic has two magic
files, one to produce a format description and one for the mime type. Your implementation
mixes the two approaches. 
I decided exactly what I wanted to implement and what I wanted to 
prepare for (at least at the time).  You're assuming that my intention 
was to simply duplicate the "file" utility, which isn't the case. 
Determining mime type was really only one of my intentions.  More import 
to me was actually determining the specific type and state of content in 
a stream of data.  It was initially built as a helper library for a 
malware detection project.
> Mime type detection is normally an action that should happen *fast*, but if I request
the mime type for an MP3 you evaluate all the nested matchers that are totally moot for the
mime type.
>
>   
Now you are talking about optimization based on one of the specific uses 
of the library.   I agree with you that there are some things that can 
certainly be done better/more efficiently.  Those need to be identified 
and patched, but let's try not to throw the baby out with the bath water.
> Looking at the code:
>
> - what's the real difference between MagicMact and MagicMatcher? Even the javadoc is
the same ...
>   
Yeah, you can open up the source to see what is going on (I see the 
javadoc is incorrect =P).  The MagicMatch object represents the data for 
an entry in the magic.xml file.  The MagicMatcher is a wrapper that has 
the logic for testing streams of data to detect if a MagicMatch entry 
matches.  There are plenty of adjustments that need to be made here though.
> - you're code is linked to Log4J. This is not good for libraries. See, some of our customers
use completely own logging implementations, but with commons-logging you can at least write
an easy bridge
>   
Yup, I agree with you.  Nobody has been pounding on the door asking for 
it and I had enough work on other projects to not concern myself too 
deeply with it.
> - you never guard log.debug with log.isDebug - and you create *a lot* of debug output
>   
Yup, certainly and area for making the library more efficient.  Again, 
completely aware of the issue...just haven't fixed it yet.
> - file magic has also its limits as already explained in this thread. You already introduced
regexp support, but you don't use it properly e.g. for the HTML types so far
>   
Definitely limits, and as I mentioned I was already moving and have 
already coded adjustments to support more of a pluggable matcher 
architecture.  And if my HTML regex matcher is broken...please send me a 
patch =)  I've been calling for folks to help build out a complete set 
of matchers for more content types, but with limited responses.

I usually just scratch my own itches.  I've also determined that I am a 
pretty lousy mind reader ;)
> OK, some of the problems would have been solved by providing an own magic.xml file. E.g.
one of my mistakes with the library was, that I assumed that the magic file was read every
time you create a Magic instance and you would have to synchronize the initializartion of
the instance if you want to share it. This assumtion was wrong, but only after looking at
the code - not by reading the javadocs.
>
>   
Yeah...documentation is the first to go =(  I try to keep my projects 
clean, organized, and as simple as possible though.  So if you browse, 
you should get a good feel for what is going on.  It's not always 
beautiful or elegant, but you shouldn't find any obfuscated code...heh

Thanks for the feedback.  I understand it is aways frustrating working 
with somebody else's code, so I'm sure it was less fun for you to deal 
with jMimeMagic than it typically is for myself.  But let's make it 
better.  I'd love to have other folks to collaborate with on this.
> - Jörg
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-user-help@jakarta.apache.org
>
>
>   


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message