Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 6401F200C30 for ; Tue, 7 Mar 2017 09:53:20 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 628C3160B74; Tue, 7 Mar 2017 08:53:20 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 627E3160B68 for ; Tue, 7 Mar 2017 09:53:19 +0100 (CET) Received: (qmail 35096 invoked by uid 500); 7 Mar 2017 08:53:18 -0000 Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@jackrabbit.apache.org Delivered-To: mailing list dev@jackrabbit.apache.org Received: (qmail 35083 invoked by uid 99); 7 Mar 2017 08:53:18 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Mar 2017 08:53:18 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id BD301C002B for ; Tue, 7 Mar 2017 08:53:17 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.18 X-Spam-Level: * X-Spam-Status: No, score=1.18 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id fcMN8aJwKXlB for ; Tue, 7 Mar 2017 08:53:13 +0000 (UTC) Received: from mail-lf0-f41.google.com (mail-lf0-f41.google.com [209.85.215.41]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 49BA25F1EE for ; Tue, 7 Mar 2017 08:53:13 +0000 (UTC) Received: by mail-lf0-f41.google.com with SMTP id j90so46880533lfk.2 for ; Tue, 07 Mar 2017 00:53:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to; bh=cXoyFwTg5DB4sEaEzrBm1Uw/rvFURhnKfue0XqxqV3E=; b=B65fqvebl5BklSaQjKGRrUxLJ11dJ1Du4TTebnpVtD7Tk7WWYi4aSrs4u/cix33Mbs 2NftxN/pO6glf/Jx68vqWqmp5SIKxWgP4xqZBbLASmIL6u2qnLrrK0zGbQdi8Wg3gGrz HQqodjprDnJ8h73V0cqQmilGEOJH0qAfPxmw2xYop6SfcfZ1lHP6zjt5zpJXe+HjGlTh +DB8ds08crUQQ6XsFulDastCUxZ0a+oDVVtXYkvcWiYNzT5JVb67Zx+O576eNOzlRI/k EIZYLo/VQgK88MQ+xZCb4CeElgCTmYS6PoalcXrtkeiSBiE8R31MBAMdC412gl7VuV9l mhUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to; bh=cXoyFwTg5DB4sEaEzrBm1Uw/rvFURhnKfue0XqxqV3E=; b=JWNh6JqdlGl1du/BTodrZ+b3D6bYnZeacNSR8s6HSbm7NLEr5IEMjhoCeYwgersy9f nmh5odIOeZmj/74I2WsFiD7W/0kG8TricvRuVR5Qr4UzJprElB4LQ7OQdvD4R1gE4S4j /uxMKuizJIhMaCHkX8Z78zJkEgKz+0r+Ofh1adPgmNFH62XJwjFYX9AdQmnoXe+tUxh+ 0GOh/iXa6fANVniAai3Fg4Dh4/1Eeki+FYGKNxOplRQ852z+rylnL2zriP3Z7oLdehTa Jm4OeZzm8jB4jzS7qFLpmr8XHyn0pCAy8O+gX85g7sZGRYlB0jSH0yCrOUe/R2rVcN4k OfHw== X-Gm-Message-State: AMke39nEdy/gwGj2D3w6OWJ1k3L5QxCSq4Fl5Dx9/uYlv9dGdyH2DGUVehG9XDrdrwS8txImRoUoqQxnexi/Ew== X-Received: by 10.25.193.8 with SMTP id r8mr5420142lff.127.1488876786808; Tue, 07 Mar 2017 00:53:06 -0800 (PST) MIME-Version: 1.0 Sender: maret.timothee@gmail.com Received: by 10.25.159.204 with HTTP; Tue, 7 Mar 2017 00:53:06 -0800 (PST) In-Reply-To: References: From: =?UTF-8?Q?Timoth=C3=A9e_Maret?= Date: Tue, 7 Mar 2017 09:53:06 +0100 X-Google-Sender-Auth: arWDUqdYONLXAMJP1bdu9tIqf3g Message-ID: Subject: Re: [FileVault][discuss] performance improvement proposal To: dev@jackrabbit.apache.org Content-Type: multipart/alternative; boundary=94eb2c1a1796fdfbc3054a202303 archived-at: Tue, 07 Mar 2017 08:53:20 -0000 --94eb2c1a1796fdfbc3054a202303 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, 2017-03-06 20:52 GMT+01:00 Felix Meschberger : > Hi > > This looks great. > > As for configuration: What is the reason for having a configuration optio= n > ? Not being able to decide ? Or real customer need for having it > configurable ? > Setting the compression level is a tradeoff between the compression speed and the size of the compressed artefacts. IMO, different use cases favour maximising either of the two or keep the current default which is a compromise between the two. For instance, as I see it, Sling Content Distribution would maximise compression speed and the AEM Quickstart would maximise compression size of its content packages. This, IMO, it makes sense to allow configuring/specifying the compression levels per use case (not globally). > > I think we should start with reasonble heuristics first and consider > configuration options in case there is a need/desire. > I have opened JCRVLT-163 to track this. We could indeed add the configuration later, assuming the increased package size (expected to be < 5% for packages containing already compressed binaries, 0% for other packages) is not an issue even with size sensitive use cases (such as the AEM Quickstart). Regards, Timothee > > Regards > Felix > > Am 06.03.2017 um 16:43 schrieb Timoth=C3=A9e Maret : > > Hi, > > With Sling content distribution (using FileVault), we observe a > significantly lower throughput for content packages containing binaries. > The main bottleneck seems to be the compression algorithm applied to ever= y > element contained in the content package. > > I think that we could improve the throughput significantly, simply by > avoiding to re-compress binaries that are already compressed. > In order to figure out what binaries are already compressed, we could use > match the content type stored along the binary against a list of > configurable content types. > > I have done some micro tests with this idea (patch in [0]). I think that > the results are promising. > > Exporting a single 250 MB JPEG is 80% faster (22.4 sec -> 4.3 sec) for a > 3% bigger content package (233.2 MB -> 240.4 MB) > Exporting AEM OOTB /content/dam is 50% faster (11.9 sec -> 5.9 sec) for a > 5% bigger content package (92.8 MB -> 97.4 MB) > Import for the same cases is 66% faster respectively 32% faster. > > I think this could either be done by default and allowing to configure th= e > list of types that skip compression. > Alternatively, it could be done on a project level, by extending FileVaul= t > with the following > > 1. For each package, allow to define the default compression level (best > compression, best speed) > 2. Expose an API that allow to plugin a custom logic to decide how to > compress a given artefact > > In any case, the changes would be backward compatible. Content packages > created with the new code would be installable on instances running the o= ld > code and vice versa. > > wdyt ? > > Regards, > > Timothee > > > [0] https://github.com/tmaret/jackrabbit-filevault/tree/ > performance-avoid-compressing-already-compressed-binaries- > based-on-content-type-detection > [1] https://docs.oracle.com/javase/7/docs/api/java/util/ > zip/Deflater.html#BEST_SPEED > > --94eb2c1a1796fdfbc3054a202303 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

2017-03-06 20:52 GMT+01:00 Felix Meschberger <<= a href=3D"mailto:fmeschbe@adobe.com" target=3D"_blank">fmeschbe@adobe.com>:
Hi

This looks great.

As for configuration: What is the reason for having a configuration op= tion ? Not being able to decide ? Or real customer need for having it confi= gurable ?

Setting the comp= ression level is a tradeoff between the compression speed and the size of t= he compressed artefacts.

IMO,=C2=A0different use c= ases favour maximising either of the two or keep the current default which = is a compromise between the two.

For instance, as = I see it, Sling Content Distribution would maximise compression speed and t= he AEM Quickstart would maximise compression size of its content packages.= =C2=A0

This, IMO, it makes sense to allow configur= ing/specifying the compression levels per use case (not globally).
=C2=A0

I think we should start with reasonble heuristics first and consider c= onfiguration options in case there is a need/desire.

I have opened=C2=A0JCRVLT-163 to track this. We could= indeed add the configuration later, assuming the increased package size (e= xpected to be < 5% for packages containing already compressed binaries, = 0% for other packages) is not an issue even with size sensitive use cases (= such as the AEM Quickstart).

Regards,=C2=A0
<= div>
Timothee
=C2=A0

Regards
Felix

Am 06.03.2017 um 16:43 schrieb Timoth=C3=A9e Maret <timothee.maret@gmail.com= >:

Hi,

With Sling content distribution (using FileVault), we observe a signif= icantly lower throughput for content packages containing binaries.
The main bottleneck seems to be the compression algorithm applied to e= very element contained in the content package.

I think that we could improve the throughput significantly, simply by = avoiding to re-compress binaries that are already compressed.
In order to figure out what binaries are already compressed, we could = use match the content type stored along the binary against a list of config= urable content types.

I have done some micro tests with this idea (patch in [0]). I think th= at the results are promising.

Exporting a single 250 MB JPEG is 80% faster (22.4 sec -> 4.3 sec) = for a 3% bigger content package (233.2 MB -> 240.4 MB)
Exporting AEM OOTB /content/dam is 50% faster (11.9 sec -> 5.9 sec)= for a 5% bigger content package (92.8 MB -> 97.4 MB)
Import for the same cases is 66% faster respectively 32% faster. =C2= =A0

I think this could either be done by default and allowing to configure= the list of types that skip compression.
Alternatively, it could be done on a project level, by extending FileV= ault with the following

1. For each package, allow to define the default compression level (be= st compression, best speed)
2. Expose an API that allow to plugin a custom logic to decide how to = compress a given artefact

In any case, the changes would be backward compatible. Content package= s created with the new code would be installable on instances running the o= ld code and vice versa.

wdyt ?

Regards,=C2=A0

Timothee
<= br>
--94eb2c1a1796fdfbc3054a202303--