jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Mueller <muel...@adobe.com>
Subject Re: [FileVault][discuss] performance improvement proposal
Date Tue, 07 Mar 2017 10:27:37 GMT
Hi,

> As for configuration: What is the reason for having a configuration option ? 

Detecting if data is compressible can be done with low overhead, without having to look at
the content type, and without having to use configuration options:

http://stackoverflow.com/questions/7027022/how-to-efficiently-predict-if-data-is-compressible

Sample code is available in one of the answers ("I implemented a few methods to test if data
is compressible…"). It is quite simple, and only needs to process 256 bytes. Both the "Partial
Entropy" and the "Simplified Compression" work relatively well.

This is not designed to be a "perfect" solution for the problem. It's a low-overhead heuristic,
that will reduce the compression overhead on the average.

Regards,
Thomas




Am 06.03.2017 um 16:43 schrieb Timothée Maret <timothee.maret@gmail.com>:

Hi, 

With Sling content distribution (using FileVault), we observe a significantly lower throughput
for content packages containing binaries.
The main bottleneck seems to be the compression algorithm applied to every element contained
in the content package.

I think that we could improve the throughput significantly, simply by avoiding to re-compress
binaries that are already compressed.
In order to figure out what binaries are already compressed, we could use match the content
type stored along the binary against a list of configurable content types.

I have done some micro tests with this idea (patch in [0]). I think that the results are promising.

Exporting a single 250 MB JPEG is 80% faster (22.4 sec -> 4.3 sec) for a 3% bigger content
package (233.2 MB -> 240.4 MB)
Exporting AEM OOTB /content/dam is 50% faster (11.9 sec -> 5.9 sec) for a 5% bigger content
package (92.8 MB -> 97.4 MB)
Import for the same cases is 66% faster respectively 32% faster.  

I think this could either be done by default and allowing to configure the list of types that
skip compression.
Alternatively, it could be done on a project level, by extending FileVault with the following

1. For each package, allow to define the default compression level (best compression, best
speed)
2. Expose an API that allow to plugin a custom logic to decide how to compress a given artefact

In any case, the changes would be backward compatible. Content packages created with the new
code would be installable on instances running the old code and vice versa.

wdyt ?

Regards, 

Timothee


[0] https://github.com/tmaret/jackrabbit-filevault/tree/performance-avoid-compressing-already-compressed-binaries-based-on-content-type-detection
[1] https://docs.oracle.com/javase/7/docs/api/java/util/zip/Deflater.html#BEST_SPEED



Mime
View raw message