jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timothée Maret <timothee.ma...@gmail.com>
Subject Re: [FileVault][discuss] performance improvement proposal
Date Tue, 07 Mar 2017 13:28:32 GMT
Hi Thomas,

2017-03-07 11:27 GMT+01:00 Thomas Mueller <mueller@adobe.com>:

> Hi,
>
> > As for configuration: What is the reason for having a configuration
> option ?
>
> Detecting if data is compressible can be done with low overhead, without
> having to look at the content type, and without having to use configuration
> options:
>
> http://stackoverflow.com/questions/7027022/how-to-
> efficiently-predict-if-data-is-compressible
>
> Sample code is available in one of the answers ("I implemented a few
> methods to test if data is compressible…"). It is quite simple, and only
> needs to process 256 bytes. Both the "Partial Entropy" and the "Simplified
> Compression" work relatively well.
>
> This is not designed to be a "perfect" solution for the problem. It's a
> low-overhead heuristic, that will reduce the compression overhead on the
> average.
>

This sounds very nice :-) we could indeed drop the list of MIME type
configuration.

IMO we should still allow to tweak between best performance and best
compression though, in order to accommodate different use cases.
I thought about covering the two aspects in JCRVLT-163, but now changed the
focus of JCRVLT-163 on avoiding compressing binaries (with or without
auto-detection) and created JCRVLT-164 for allowing to tweak the default
compression level.


Regards,

Timothee


>
> Regards,
> Thomas
>
>
>
>
> Am 06.03.2017 um 16:43 schrieb Timothée Maret <timothee.maret@gmail.com>:
>
> Hi,
>
> With Sling content distribution (using FileVault), we observe a
> significantly lower throughput for content packages containing binaries.
> The main bottleneck seems to be the compression algorithm applied to every
> element contained in the content package.
>
> I think that we could improve the throughput significantly, simply by
> avoiding to re-compress binaries that are already compressed.
> In order to figure out what binaries are already compressed, we could use
> match the content type stored along the binary against a list of
> configurable content types.
>
> I have done some micro tests with this idea (patch in [0]). I think that
> the results are promising.
>
> Exporting a single 250 MB JPEG is 80% faster (22.4 sec -> 4.3 sec) for a
> 3% bigger content package (233.2 MB -> 240.4 MB)
> Exporting AEM OOTB /content/dam is 50% faster (11.9 sec -> 5.9 sec) for a
> 5% bigger content package (92.8 MB -> 97.4 MB)
> Import for the same cases is 66% faster respectively 32% faster.
>
> I think this could either be done by default and allowing to configure the
> list of types that skip compression.
> Alternatively, it could be done on a project level, by extending FileVault
> with the following
>
> 1. For each package, allow to define the default compression level (best
> compression, best speed)
> 2. Expose an API that allow to plugin a custom logic to decide how to
> compress a given artefact
>
> In any case, the changes would be backward compatible. Content packages
> created with the new code would be installable on instances running the old
> code and vice versa.
>
> wdyt ?
>
> Regards,
>
> Timothee
>
>
> [0] https://github.com/tmaret/jackrabbit-filevault/tree/
> performance-avoid-compressing-already-compressed-binaries-
> based-on-content-type-detection
> [1] https://docs.oracle.com/javase/7/docs/api/java/util/
> zip/Deflater.html#BEST_SPEED
>
>
>
>


-- 
Timothée Maret

Mime
View raw message