Return-Path: X-Original-To: apmail-subversion-users-archive@minotaur.apache.org Delivered-To: apmail-subversion-users-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EFE6017FC0 for ; Thu, 28 Jan 2016 18:36:50 +0000 (UTC) Received: (qmail 43767 invoked by uid 500); 28 Jan 2016 18:36:45 -0000 Delivered-To: apmail-subversion-users-archive@subversion.apache.org Received: (qmail 43739 invoked by uid 500); 28 Jan 2016 18:36:45 -0000 Mailing-List: contact users-help@subversion.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list users@subversion.apache.org Received: (qmail 43679 invoked by uid 99); 28 Jan 2016 18:36:45 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Jan 2016 18:36:45 +0000 Received: from [192.168.1.240] (e183083236.adsl.alicedsl.de [85.183.83.236]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 0491F1A04D6; Thu, 28 Jan 2016 18:36:42 +0000 (UTC) Message-ID: <56AA5FF2.6040906@apache.org> Date: Thu, 28 Jan 2016 19:37:38 +0100 From: Stefan Fuhrmann Organization: Apache Software Foundation User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: users@subversion.apache.org, Gert Kello Subject: Re: Svn 1.9 repository 20% bigger than svn 1.8 repository Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On Thu, 28 Jan 2016 11:54:14 +0200, you wrote: > I have a svn 1.9 repository, created with svnsync, that has ~150000 > revisions and size about 45 GB. 300kB/rev is quite large, like >1 MB of changes before compression - on average. Are these office documents, large xml / html files or simply many files per commit? > Due to some issues in svn-all-fast-export I > wanted to have svn 1.8 version repository so I downgraded it by doing > svnadmin (v 1.9) dump /svnadmin (v 1.8) load cycle. I was surprised that > the size of v 1.8 repository is "only" 37.5 GB > I tried to compare content of db\revs folder: some files are bigger in 1.8 > repo, some in 1.9 repo. For the record: you already said elsewhere in this thread that you used 1.8 to create the 1.8 repo and 1.9 for the 1.9. I also assume standard settings as in "no fsfs.conf tweaks". > Now I'm wondering: > 1. Is such size increase expected for 1.9 repository? I read that 1.9 was > aimed at speed optimizations, but 20% size increase compared to 1.8 sounds > pretty big... A 20% plus is definitely unexpected, +/-5% being a more typical number. It is not entirely implausible, though. Here is how 1.9 differs from 1.8: * 1.9 adds "index" data to the rev / pack files, allowing for slightly shorter data elsewhere. The typical net effect is +5% in size. * 1.9 adds some padding at the end of each block (64k boundary by default) to avoid parsed data crossing block boundaries. Net effect typ. +1%. * 1.9 will use skip-deltas between shards where 1.8 would still use "linear" deltification. Net effect typ. +2% * 1.9 will store deltas against very small files or directories. Net effect typ. <1% * 1.9 now supports representation sharing for node properties. Net effect typ. 0..-5%. * 1.9 now supports representation sharing when committing the same data to multiple paths / branches within the same revision. Net effect typ. 0..-5%. The theme behind these changes is I/O reduction: Maximize data sharing, enable reordering of repo data upon pack and avoid "pointer chasing" for small pieces of information. > 2. Or is my "dumped and reloaded 1.8" broken somehow? How could I verify? > (dump revisions one by one and compare? Or is there any better way?) There is a simple way to compare the "content size" your repositories. Run the 1.9 svnfsfs tool on both: svnfsfs stats -M 1000 /path/to/repo > /some/output/path It basically reads the whole repository, groups and aggregates the item sizes and produces a long report. Number of changes and node revision should be more or less (exactly?) the same. If they are, you'll be good. "Representation" size is where the numbers will differ. Looking at the differences in detail, you should be able to pin down one or two file extensions that account for most of the increase. It would be interesting to learn what is special about them ... -- Stefan^2.