jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobias Bocanegra <tri...@apache.org>
Subject Re: [FileVault][discuss] performance improvement proposal
Date Thu, 09 Mar 2017 13:49:01 GMT
Hi,

one issue to remember is that you can only change the compression level per
zip-entry. I didn't test too much, but from the javadoc is says:

public void setLevel(int level)
Sets the compression level for subsequent entries which are DEFLATED. The
default setting is DEFAULT_COMPRESSION.

I'm not exactly sure if zip retains the dictionary if you switch
compression levels, but I would assume not. i.e. if you have a lot of small
text files, interleaved with binaries, then the text files are probably not
compressed. which might not be a problem, though.

it would be interresting to see some tests that take a typical content
asset content package, that has many text files (.content.xml) and few
compressed binaries (jpegs).

- what is the size difference of the final binary with no compression at
all?
- what is the size difference of the final binary with interleaved
compression?
- what are the performance characteristics to unpack/pack the zips?

regards, toby






On Thu, Mar 9, 2017 at 8:10 PM, Thomas Mueller <mueller@adobe.com> wrote:

> Hi,
>
> > I think your help is mandatory, given the level of voodoo in the five
> lines you propose :-)
>
> Sure, I can help.
>
> > I did some preliminary tests with the "partial entropy" method … and it
> seems the algorithm works but it does not get as fast as the content type
> detection method.
>
> Note you only need to test about 256 bytes, not the whole binary. Sure,
> the more the better.
>
> > Maybe ultimately we could keep both heuristics.
>
> I agree. But not to speed up things: to avoid false positives / negatives.
> Auto-detection is far from perfect.
>
> > Start with the content type detection that would match against MIME
> types we know for sure are compressed (expected to be a reasonably fixed
> and short list of MIME types).
>
> I would probably use the following logic:
>
> * list of mime types that are compressed (text/plain and so on)
> * list of mime types that should not be compressed (application/zip,
> application/java-archive, and so on)
>
> For the remainder, and if you don't know the mime type, I would use
> auto-detection.
>
> Regards,
> Thomas
>
>
>

Mime
View raw message