From dev-return-38455-archive-asf-public=cust-asf.ponee.io@subversion.apache.org Mon Oct 29 14:27:24 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id EC88C180677 for ; Mon, 29 Oct 2018 14:27:23 +0100 (CET) Received: (qmail 69275 invoked by uid 500); 29 Oct 2018 13:27:23 -0000 Mailing-List: contact dev-help@subversion.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@subversion.apache.org Received: (qmail 69260 invoked by uid 99); 29 Oct 2018 13:27:22 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Oct 2018 13:27:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id F245918FE66 for ; Mon, 29 Oct 2018 13:27:21 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.898 X-Spam-Level: * X-Spam-Status: No, score=1.898 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=qqmail.nl Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id U5aKjy_cYdSZ for ; Mon, 29 Oct 2018 13:27:19 +0000 (UTC) Received: from mail-lf1-f65.google.com (mail-lf1-f65.google.com [209.85.167.65]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 44D0C5F232 for ; Mon, 29 Oct 2018 13:27:19 +0000 (UTC) Received: by mail-lf1-f65.google.com with SMTP id u18so5992817lff.10 for ; Mon, 29 Oct 2018 06:27:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qqmail.nl; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=s1r+vNnTYD8QNwHsaQfP5N5HJp5CiONOXFLSKX0jJSo=; b=cHVs/wqLGqoT+FHt2+7I4/7hUlfSDcr9k0WeTKDCCNFApSpfpEUQfW97xdXIK9EJhb DiJ0c42T6gaN6F1aol8gk8bPwhZiqkjXt19PE1W09qVOsmdJjaEICmRsOiVswjA0LlD6 WROoibOKYa34fTsKdn2VceOiwH4YsUAmQouiM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=s1r+vNnTYD8QNwHsaQfP5N5HJp5CiONOXFLSKX0jJSo=; b=tepg8uXvTcoj/TN1aSmgMr4/5k91h35QBzlYE78RNEp6jcyFlzNP6uvnupLFXSxtr8 oMTStTXzNgnbB2tYtJhiEFCheN+bfj0tJib+vqIzaNe46lok2A0jHDenAaXNJtOSJ6Fq CPDl2rg5Rx7LOFov73IqeQ3BKnMdZNgr9vXexElFbjHVHX7nbRQOZ0DvpZcy5guSHzqV F3IRUfaGvdQz1R4Y2Z0iSbYad+0sVDRDWWVD2VeVHOhkY6hdWieETU8y7yggG+pkgfGM u0UnSADKcVZnrBgx+D0ph+badFGC9XpWf8eZpEW+9JG0rkfzhw8hURsFghfNA7aEF3oo sb0Q== X-Gm-Message-State: AGRZ1gJIvp29vZOe/x5fW0GxJNfgmnXR7Cw6NbfaDL0fUB46/I+By8ga IdHQCXo25lrjOHPGkq1qHSwzCvvXhy613HvNpcyf3g== X-Google-Smtp-Source: AJdET5emXNd36h/kmkGOwiHg4URbgCd2HxRHYW7wh3yX/YzrOzVMjcjMZskRM1cEBKljDoNAQ62fUgLnfA80G0aLrFg= X-Received: by 2002:ac2:4159:: with SMTP id c25-v6mr8682427lfi.19.1540819637468; Mon, 29 Oct 2018 06:27:17 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a05:6504:3c4:0:0:0:0 with HTTP; Mon, 29 Oct 2018 06:27:16 -0700 (PDT) In-Reply-To: <8e86eac5-d617-3dfa-127d-8d185ba63a1a@apache.org> References: <8e86eac5-d617-3dfa-127d-8d185ba63a1a@apache.org> From: Bert Huijben Date: Mon, 29 Oct 2018 14:27:16 +0100 Message-ID: Subject: Re: [PATCH] Proof of concept of the better-pristines (LZ4 + storing small pristines as BLOBs) (Was: Re: svn commit: r1843076) To: =?UTF-8?Q?Branko_=C4=8Cibej?= Cc: dev@subversion.apache.org Content-Type: multipart/alternative; boundary="00000000000027961505795e0787" --00000000000027961505795e0787 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Windows' NTFS implementation very small files (probably something like < 256 bytes, but this is not documented/strictly stable) are stored in the directory table and so don't use 'a whole cluster'. Nice work on all the research! Bert On Tue, Oct 23, 2018 at 6:12 PM, Branko =C4=8Cibej wrote= : > On 22.10.2018 22:14, Evgeny Kotkov wrote: > > Branko =C4=8Cibej writes: > > > >> Still missing is a mechanism for the libsvn_wc (and possibly > >> libsvn_client) to determine the capabilities of the working copy at > >> runtime (this will be needed for deciding whether to use compressed > >> pristines). > > FWIW, I tried the idea of using LZ4 to compress the pristines and > storing small > > pristines as blobs in the `PRISTINE` table. I was particularly > interested in > > how such change would affect the performance and what kind of obstacles > > would have to be dealt with. > > Nice! I did some simpler tests by compressing exported trees, but this > is definitely better. > > > In the attachment you will find a more or less functional implementatio= n > of > > this idea that might be useful to some extent. The patch is a proof of > > concept: it doesn't include the WC compatibility bits and most certainl= y > > doesn't have everything necessary in place. But in the meanwhile, I > think > > that is might give a good approximation of what can be expected from th= e > > approach. > > > > The patch applies to the `better-pristines` branch. > > > > A couple of observations: > > > > - As expected, the combined size of the pristines is halved when the > data > > itself is compressible, thus making the working copy 25% smaller. > > Yes, that was my observation as well. In fact, though, storing small > BLOBs in the database itself should have even better effects, since the > space on disk actually used by a file is rounded up to the nearest > cluster size, but SQLite's blocks are typically much smaller than that. > > > > - A variety of the callers currently access the pristine contents by > reading > > the corresponding files. That doesn't work in case of compressed > pristines > > or pristines stored as BLOBs. > > > > I think that ideally we would want to use streams as much as > possible, and > > only spill the uncompressed pristine contents to temporary files whe= n > we > > need to pass them to external tools, etc.; and that temporary files > need > > to be backed by a work queue to avoid leaving them in place in case > of an > > application crash. > > Yes and yes. Keeping those temporary spilled files on disk could turn > out to be a problem, finding a reasonable time to delete them without > having to run cleanup will be rather important, I think. > > > > The patch does that kind of plumbing to some extent, but that part o= f > the > > work is not complete. The starting point is around wc_db_pristine.c= : > > svn_wc__db_pristine_get_path(). > > > > - Using BLOBs to store the pristine contents didn't have a measurable > impact > > on the speed of the WC operations such as checkout in my experiments > on > > Windows. These experiments were not comprehensive, and also I didn'= t > run > > the tests on *nix. > > I wouldn't expect much change in performance but would expect better use > of the disk, as explained above. > > > - There's also the deprecated svn_wc_get_pristine_copy_path() public > API that > > would require plumbing to maintain compatibility; the patch performs > it by > > spilling the pristine contents result into a temporary file whose > lifetime > > is attached to the `result_pool`. > > Ack; that's one reasonable definition of "lifetime." But I suspect that > any users of that function expect the pristine file to survive at least > to the next WC cleanup. > > > (I probably won't be able to continue the work on this patch in the > nearby > > future; posting this in case it might be useful.) > > Thanks, it definitely is useful! > > -- Brane > > --00000000000027961505795e0787 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Windows' NTFS implementation very small files = (probably something like < 256 bytes, but this is not documented/strictl= y stable) are stored in the directory table and so don't use 'a who= le cluster'.

Nice work on all the research!

=C2=A0 =C2=A0 Bert

On Tue, Oct 23, 2018 at 6:12 PM, Branko= =C4=8Cibej <brane@apache.org> wrote:
On 22.10.2018 22:14, Evgeny Kotkov wrote:
> Branko =C4=8Cibej <brane@apache= .org> writes:
>
>> Still missing is a mechanism for the libsvn_wc (and possibly
>> libsvn_client) to determine the capabilities of the working copy a= t
>> runtime (this will be needed for deciding whether to use compresse= d
>> pristines).
> FWIW, I tried the idea of using LZ4 to compress the pristines and stor= ing small
> pristines as blobs in the `PRISTINE` table.=C2=A0 I was particularly i= nterested in
> how such change would affect the performance and what kind of obstacle= s
> would have to be dealt with.

Nice! I did some simpler tests by compressing exported trees, but th= is
is definitely better.

> In the attachment you will find a more or less functional implementati= on of
> this idea that might be useful to some extent.=C2=A0 The patch is a pr= oof of
> concept: it doesn't include the WC compatibility bits and most cer= tainly
> doesn't have everything necessary in place.=C2=A0 But in the meanw= hile, I think
> that is might give a good approximation of what can be expected from t= he
> approach.
>
> The patch applies to the `better-pristines` branch.
>
> A couple of observations:
>
>=C2=A0 - As expected, the combined size of the pristines is halved when= the data
>=C2=A0 =C2=A0 itself is compressible, thus making the working copy 25% = smaller.

Yes, that was my observation as well. In fact, though, storing small=
BLOBs in the database itself should have even better effects, since the
space on disk actually used by a file is rounded up to the nearest
cluster size, but SQLite's blocks are typically much smaller than that.=


>=C2=A0 - A variety of the callers currently access the pristine content= s by reading
>=C2=A0 =C2=A0 the corresponding files.=C2=A0 That doesn't work in c= ase of compressed pristines
>=C2=A0 =C2=A0 or pristines stored as BLOBs.
>
>=C2=A0 =C2=A0 I think that ideally we would want to use streams as much= as possible, and
>=C2=A0 =C2=A0 only spill the uncompressed pristine contents to temporar= y files when we
>=C2=A0 =C2=A0 need to pass them to external tools, etc.; and that tempo= rary files need
>=C2=A0 =C2=A0 to be backed by a work queue to avoid leaving them in pla= ce in case of an
>=C2=A0 =C2=A0 application crash.

Yes and yes. Keeping those temporary spilled files on disk could tur= n
out to be a problem, finding a reasonable time to delete them without
having to run cleanup will be rather important, I think.


>=C2=A0 =C2=A0 The patch does that kind of plumbing to some extent, but = that part of the
>=C2=A0 =C2=A0 work is not complete.=C2=A0 The starting point is around = wc_db_pristine.c:
>=C2=A0 =C2=A0 svn_wc__db_pristine_get_path().
>
>=C2=A0 - Using BLOBs to store the pristine contents didn't have a m= easurable impact
>=C2=A0 =C2=A0 on the speed of the WC operations such as checkout in my = experiments on
>=C2=A0 =C2=A0 Windows.=C2=A0 These experiments were not comprehensive, = and also I didn't run
>=C2=A0 =C2=A0 the tests on *nix.

I wouldn't expect much change in performance but would expect be= tter use
of the disk, as explained above.

>=C2=A0 - There's also the deprecated svn_wc_get_pristine_copy_path(= ) public API that
>=C2=A0 =C2=A0 would require plumbing to maintain compatibility; the pat= ch performs it by
>=C2=A0 =C2=A0 spilling the pristine contents result into a temporary fi= le whose lifetime
>=C2=A0 =C2=A0 is attached to the `result_pool`.

Ack; that's one reasonable definition of "lifetime." B= ut I suspect that
any users of that function expect the pristine file to survive at least
to the next WC cleanup.

>=C2=A0 (I probably won't be able to continue the work on this patch= in the nearby
>=C2=A0 future; posting this in case it might be useful.)

Thanks, it definitely is useful!

-- Brane


--00000000000027961505795e0787--