From dev-return-38423-archive-asf-public=cust-asf.ponee.io@subversion.apache.org Tue Oct 23 18:12:57 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 2C901180675 for ; Tue, 23 Oct 2018 18:12:57 +0200 (CEST) Received: (qmail 64045 invoked by uid 500); 23 Oct 2018 16:12:56 -0000 Mailing-List: contact dev-help@subversion.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@subversion.apache.org Received: (qmail 64025 invoked by uid 99); 23 Oct 2018 16:12:55 -0000 Received: from mail-relay.apache.org (HELO mailrelay1-lw-us.apache.org) (207.244.88.152) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Oct 2018 16:12:55 +0000 Received: from [172.17.3.69] (unknown [77.234.149.122]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id D5344205 for ; Tue, 23 Oct 2018 16:12:54 +0000 (UTC) Subject: Re: [PATCH] Proof of concept of the better-pristines (LZ4 + storing small pristines as BLOBs) (Was: Re: svn commit: r1843076) To: dev@subversion.apache.org References: From: =?UTF-8?Q?Branko_=c4=8cibej?= Openpgp: preference=signencrypt Autocrypt: addr=brane@apache.org; prefer-encrypt=mutual; keydata= xsFNBFG3qpMBEACi+jRQDd2TiYeAxVgrLZ3cyyuGOFSMh4nCyUOG9BwXC69cDLH48RcE0Mpu TFTGlfdokz6JgLKU3uqShPXiflrL6JIVnJX4rTEKRzFNkcS6Zq0PfNRnFnkwiD2KIzyAG8XE y0c1Bt7hqZ5dfXaC1b7Xo+1cnlqjdLAOnr1ruTrtfQ5sO81p9jYtARVa+iVmf8bs/FvC9Yn2 QtEDtuUfUUHx2bnB9vmh8tOjErfIcWtzCPt8uTUkmiszlkRMiB5/X97oqXlX/5dSQWE9m4M5 6Fc9ixIrmCwkF515RLrCNTv/YAtmpu4VaB0rxgTuSku0cVk83xSMrH2hNFx1fAeYBZpwp2GL ONlTy3D2N+BjWXjEUE9baGOoYM7QUbAdj4JMstSByppaAi4AiG9+raxknTWtWt2IT9LHW7Pu i6S3k4WL5jmTdQKqNQ9/+vRqiSVsA98yHQLa+s19IYh4F7WIfo2lzBAn06HEntpKS9TtV20o JyMBLOVqQP1dARWRfB0xIxGtbI61CfjEhCeG8H+UynCrHkUxgUoKsXXkI/JxsIMZ3TivFj3U MJVur7KVwg/isqqaEyMfUnCrXJxexZp8kuTjkzzvDKfYs0vHJezPQYhlqBLkK2w9VzktGjA7 lb+TO69bEyPOcBjVsCtrdYVc442/Z37G+1UV5+1X06m14Pt9UQARAQABzSBCcmFua28gxIxp YmVqIDxicmFuZUBhcGFjaGUub3JnPsLBegQTAQoAJAIbAwULCQgHAwUVCgkICwUWAgMBAAIe AQIXgAUCUbetMwIZAQAKCRAbymWGo0eUP2tOD/9KOLYfxwTcGV/Nj3lnKE4Y4gRl0r4cfnWm 1/2KyPYVsmQ8vWRUZxjuVHAvZrAkTBvlu+CVzrCWEEpCzQC/jki0xkPQchTEU2XOHQ6PzkXB 17o1NSSu/vyKynh0pXMRTHm4wZodzUw/tHn/Ism5QyRyhlYUP4mVX8v2hbN+stkJHrkdVBPm FspnFidhulUP5hr+LWz2qd+Ab8MOn3+x25jsGE8yaUiqmNdrmq/trvHPGThySa4Hz0uEkhfP K2knc6PpV5GTbeRn/J1eu17xVgXYVgko35Qwz5s/LRat+5R79tgBAL9SKFybCVBPr6/1Zp4u w9b0NcHW6t3aQHCxv8iEqxrJ7UIDhh/hXh4no0vzpPR1Cgjn6fK997WrpUyaAtlnbSH5QGad YY9rpFka3o2Gj+f+cr75hq6c7DnNJo94eGw9L0JEjfgordi8UkWErGOklnGf8N8brlVG0TdW 7KOz60m1E3UzIwd2lQd9a0zd8Mqrmn2MMPdJt4EpKQWaJsoK+FOdEBX0Ezm3StEXufe1IOG9 DihVcOnsx/G6aTS9GyKjURVt0jDB4wsVSzsRHYHmQpw7/ekvHFNKZS5yMNwSt2X/Szmk4GmV 69gaI79kf8VD87xwE31p0s0uVIVp7MTOTEYT5HUh5Rz6Rr66+vg9qgN1enMj5sh4f8krXgRR wM7BTQRRt6qTARAAnxIdGqDTC2FU9AE2ElT/m/Hs/57BwqUUb8qod3mJ6Qkp7PpHCBnvtbwm krrCsJl5rR1fliton6qoJUNCSfmcfeujcU8Be+q75rNZxIWi6AjMmyrjyMp9JIO7g/7+VYmL dm9c1wRn4QDnIKxl7qMPz9q8/OF6BGEMEW4zRL8rHvM7CCapOikHUKKq7GnZMVyYbue6KUTA Tczxjt6E9Av1QDnnW9zbW56jqUKdgpNek/bSTuef2xYEDzIzFPQREyw8E/C3xx8zZfOJ0+XV s1n39GLp3vugP5IBNE2pgqcyFtKISj1pVJgDr7zXjD92ZGS8xgqDxePTuf1LcCwd65BJNVVK IFsFicvBVhdslCZ7l8jkCuZAzYoFJZthUKuuJg1n7HYi8XLifZmun9Z3fbM5gk9/vA1rXsWt An597BACKDUkWA5tOb3Si4/MaRDiZYvzplHGc4sTn4aBIj3VFGGFNlOUPFLWjZLHdudNOBGj 3eIlz/DQZh/mwNGn5g98c3xehHnWxcXa0PsN2Xl1iRM2dec8drEVVRYaWPcOmGhKfqnlwl2z OeuST2TMcWhxKshVimR9eSt5pX1oGOD9PZ9V0gQDIr4d35UjQaW5ABCWbgTd7e3yPTlHoWx4 qyv+YoxEf6AlQ4nvE+q1s4wRBs/eNVQsROnYmhKhYPZUsDE6EocAEQEAAcLBXwQYAQoACQUC UbeqkwIbDAAKCRAbymWGo0eUPwd6D/92i5LBHSluiBdnzYH3kYlkIMjhy3lcqtxb/TWV1X/z CVpaZkEXvL9NQ44ZqfiOFB8fnaJvy+9rfIL3MwHKLVHOjsurBRP2DJ8H/EI6QuZV//Nxh66A dicXlE5SSiKQ5KcIH+eqZHa4XjVeXGeNZummrlhOv3ItKXETVhh2qeIQ/7zCjuw5rQk606+2 isg6cs4Nwtie1rXQ1KFtkTNQqWfqyM4PrEP9Bq5pWBQVkcxDsxk1Yj3A8L80IY3Hzwm8nRlq F+HkD/0IPgHICVDyiOB4XZtqVk+DHNOolCcdrFSXOcwt+qwD5zk4p0hdHKHagAPGBDXS8shm k2vaUDbKMUoVDdj579Jtp4tNOoVEEqqXspT995w7+ckbHGoQhFlSxCwtaXCr/8wwdwcCA2eO w0aLYrU04EbnH7Ryj4aTjsBGvJdmyZQT8/lTj5VARbEkNXTdTOs61pebDliyWtcF9Uz9b44p cLNniphcBO4SP/IMlEh8pBAJ1C2QlD4G90iJ1WK0MsJsUDix9Vb5s1AE6WA/Ss1iPCOdhhif eToCAwoobIipoxUZF2ik3oESskmMDolpVBiaPaFg+YPtNp/53dLap7jBNRNgyKXaGJAZaolp L+9hCU1EOWswqusDHDFSRUuYOXfuXZJxcbQUTnhQhRbvSDy3tDMRGd252Ur1sCOU5g== Organization: The Apache Software Foundation Message-ID: <8e86eac5-d617-3dfa-127d-8d185ba63a1a@apache.org> Date: Tue, 23 Oct 2018 18:12:53 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-GB On 22.10.2018 22:14, Evgeny Kotkov wrote: > Branko Čibej writes: > >> Still missing is a mechanism for the libsvn_wc (and possibly >> libsvn_client) to determine the capabilities of the working copy at >> runtime (this will be needed for deciding whether to use compressed >> pristines). > FWIW, I tried the idea of using LZ4 to compress the pristines and storing small > pristines as blobs in the `PRISTINE` table. I was particularly interested in > how such change would affect the performance and what kind of obstacles > would have to be dealt with. Nice! I did some simpler tests by compressing exported trees, but this is definitely better. > In the attachment you will find a more or less functional implementation of > this idea that might be useful to some extent. The patch is a proof of > concept: it doesn't include the WC compatibility bits and most certainly > doesn't have everything necessary in place. But in the meanwhile, I think > that is might give a good approximation of what can be expected from the > approach. > > The patch applies to the `better-pristines` branch. > > A couple of observations: > > - As expected, the combined size of the pristines is halved when the data > itself is compressible, thus making the working copy 25% smaller. Yes, that was my observation as well. In fact, though, storing small BLOBs in the database itself should have even better effects, since the space on disk actually used by a file is rounded up to the nearest cluster size, but SQLite's blocks are typically much smaller than that. > - A variety of the callers currently access the pristine contents by reading > the corresponding files. That doesn't work in case of compressed pristines > or pristines stored as BLOBs. > > I think that ideally we would want to use streams as much as possible, and > only spill the uncompressed pristine contents to temporary files when we > need to pass them to external tools, etc.; and that temporary files need > to be backed by a work queue to avoid leaving them in place in case of an > application crash. Yes and yes. Keeping those temporary spilled files on disk could turn out to be a problem, finding a reasonable time to delete them without having to run cleanup will be rather important, I think. > The patch does that kind of plumbing to some extent, but that part of the > work is not complete. The starting point is around wc_db_pristine.c: > svn_wc__db_pristine_get_path(). > > - Using BLOBs to store the pristine contents didn't have a measurable impact > on the speed of the WC operations such as checkout in my experiments on > Windows. These experiments were not comprehensive, and also I didn't run > the tests on *nix. I wouldn't expect much change in performance but would expect better use of the disk, as explained above. > - There's also the deprecated svn_wc_get_pristine_copy_path() public API that > would require plumbing to maintain compatibility; the patch performs it by > spilling the pristine contents result into a temporary file whose lifetime > is attached to the `result_pool`. Ack; that's one reasonable definition of "lifetime." But I suspect that any users of that function expect the pristine file to survive at least to the next WC cleanup. > (I probably won't be able to continue the work on this patch in the nearby > future; posting this in case it might be useful.) Thanks, it definitely is useful! -- Brane