jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Mueller (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (JCRVLT-163) Allow to define compression levels
Date Tue, 07 Mar 2017 11:50:37 GMT

    [ https://issues.apache.org/jira/browse/JCRVLT-163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15899314#comment-15899314

Thomas Mueller commented on JCRVLT-163:

I guess the vast majority of binary data is stored in "jcr:data" properties. But not all,
and the list of possible mime types is relatively large. Also, for some mime types, data is
not always compressed (for example tiff).

For such cases, it might be interesting to use [auto-detection|http://stackoverflow.com/questions/7027022/how-to-efficiently-predict-if-data-is-compressible].
This can be done with low overhead, without knowing the content type, and without configuration
options. Sample code is available in one of the answers ("I implemented a few methods to test
if data is compressible…"). It is quite simple, and only needs to process 256 bytes. Both
the "Partial Entropy" and the "Simplified Compression" work relatively well. This is not designed
to be a "perfect" solution for the problem. It's a low-overhead heuristic, that will reduce
the compression overhead on the average.

> Allow to define compression levels
> ----------------------------------
>                 Key: JCRVLT-163
>                 URL: https://issues.apache.org/jira/browse/JCRVLT-163
>             Project: Jackrabbit FileVault
>          Issue Type: Improvement
>          Components: Packaging
>    Affects Versions: 3.0
>            Reporter: Timothee Maret
>             Fix For: 3.1.36
> As discussed in [0], this issue tracks allowing to specify the compression level when
building packages. The primary idea is to avoid compressing (compression level = {{NO_COMPRESSION}})
 already compressed binaries, identified based on their MIME type.
> Setting the compression level is a tradeoff between the compression speed and the size
of the compressed artefacts.
> Different use cases likely favour maximising either of the two. 
> Therefor, it may make sense to allow configuring the compression levels per use case
(not globally).
> A generic way to express this configuration would be:
> * a mapping from MIME type to compression level
> * the default level (for MIME type not matching any entry in the mapping)
> [0] https://www.mail-archive.com/dev@jackrabbit.apache.org/msg37807.html

This message was sent by Atlassian JIRA

View raw message