jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Mueller <muel...@adobe.com>
Subject Re: [FileVault][discuss] performance improvement proposal
Date Thu, 09 Mar 2017 11:10:42 GMT
Hi,

> I think your help is mandatory, given the level of voodoo in the five lines you propose
:-)

Sure, I can help.

> I did some preliminary tests with the "partial entropy" method … and it seems the algorithm
works but it does not get as fast as the content type detection method.

Note you only need to test about 256 bytes, not the whole binary. Sure, the more the better.

> Maybe ultimately we could keep both heuristics.

I agree. But not to speed up things: to avoid false positives / negatives. Auto-detection
is far from perfect.

> Start with the content type detection that would match against MIME types we know for
sure are compressed (expected to be a reasonably fixed and short list of MIME types).

I would probably use the following logic:

* list of mime types that are compressed (text/plain and so on)
* list of mime types that should not be compressed (application/zip, application/java-archive,
and so on)

For the remainder, and if you don't know the mime type, I would use auto-detection.

Regards,
Thomas


Mime
View raw message