Return-Path: X-Original-To: apmail-subversion-dev-archive@minotaur.apache.org Delivered-To: apmail-subversion-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 37B3698CB for ; Wed, 4 Apr 2012 09:39:43 +0000 (UTC) Received: (qmail 11039 invoked by uid 500); 4 Apr 2012 09:39:43 -0000 Delivered-To: apmail-subversion-dev-archive@subversion.apache.org Received: (qmail 10823 invoked by uid 500); 4 Apr 2012 09:39:38 -0000 Mailing-List: contact dev-help@subversion.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@subversion.apache.org Received: (qmail 10766 invoked by uid 99); 4 Apr 2012 09:39:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Apr 2012 09:39:36 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_FORGED_REPLYTO,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [72.30.239.19] (HELO nm38-vm3.bullet.mail.bf1.yahoo.com) (72.30.239.19) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 04 Apr 2012 09:39:27 +0000 Received: from [98.139.215.141] by nm38.bullet.mail.bf1.yahoo.com with NNFMP; 04 Apr 2012 09:39:06 -0000 Received: from [98.139.212.250] by tm12.bullet.mail.bf1.yahoo.com with NNFMP; 04 Apr 2012 09:39:06 -0000 Received: from [127.0.0.1] by omp1059.mail.bf1.yahoo.com with NNFMP; 04 Apr 2012 09:39:06 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 43568.90726.bm@omp1059.mail.bf1.yahoo.com Received: (qmail 94250 invoked by uid 60001); 4 Apr 2012 09:39:05 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1333532345; bh=L85gAB4TSPP9mKbKfaj3GFR8tAGjnjydhO7f4wtDbVs=; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=v8o6sp7yFWZXcfC4ewxT0gks6eqnSqRg9+ghLVwvxDczxBy7eQ9XS6N66bsf+P69u3TMUDc9cYaJarVAU4KZBJer4lsyiDNX/1G5HoDRyy5H4vKmkqfyGJT/4Jb+QtLoUCuW9q7Pcw6oRPDb94gleMg0rAiy27OL3YAlf7xDZuM= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=ymhfrBOUXOommVyTAHOTRubrJnClmKAzbHq4RP1FU/BS78Vh2o60sOVPGssj4xdyxp/qemDBtsNZBtRzqQuDNSifR66EGQhWroYmD+2aiVYckLW9YofctzKdNvq54vZ0jYHBZSzultPfXHHy9CJ0knkuG2+4mgTT4HdPdPqNN5c=; X-YMail-OSG: 7HMCwi0VM1ngNimsQm01S0MUBXEycMV5rG1J7uk4tQenhQx _wM2YsZ4eS7wDj2o1d4YPFYO93mNYwkMT84ye7yd2zU5GMOk.SBwLX7KbDsc .LCN.pGQX1TZq1vCZiitXsQC9nlcbiKbmGCa3w5WAKsNzKt6XvcpxjiLziHI Z_HrjXbokgyyOd8pDjgSWaqew12CdSuYZTy7phSxFsgFPdkTgy64PQiLrZD3 6wcLRb9iicYo7PcNAcuMpbmm_ka_00EV0LwrzDx9aauPxD_H64RGDQi136bW nzMs3x9nwyPwds0wftVjeGLiP_A5mxw6TP1xsNPIDyPpNv.QsFYH5RgLhr0Z Kio5vretshddgjuxpZ3hCZCx7NjEbM7KMrcG02Y1BmvXW9Xec8CJORaRrb2x aR90ddHoibTGl0AHnHJWYSWxLQbIuFBbeClM9LjK2W0hGLzZxKYDtCOP7Uo8 BbW4CBLwWcjVHx5iCIa4N_9H0BOfUcO8L9oNQPFHtGRySY35FR41b Received: from [91.103.31.35] by web161404.mail.bf1.yahoo.com via HTTP; Wed, 04 Apr 2012 02:39:05 PDT X-Mailer: YahooMailWebService/0.8.117.340979 References: <1332357542.34175.YahooMailNeo@web161405.mail.bf1.yahoo.com> <20120321201136.GE15232@daniel3.local> <20120321210737.GA11249@daniel3.local> <1332400521.26161.YahooMailNeo@web161401.mail.bf1.yahoo.com> <20120322143017.GA3696@daniel3.local> <1332429520.60561.YahooMailNeo@web161405.mail.bf1.yahoo.com> <4F6B4734.3020307@apache.org> <20120322155022.GA4827@daniel3.local> <4F6B4CC3.90901@apache.org> <4F6EFB51.3060901@e-reka.si> <1332758839.59233.YahooMailNeo@web161401.mail.bf1.yahoo.com> <1333182675.5491.YahooMailNeo@web161406.mail.bf1.yahoo.com> <727D8E16AE957149B447FE368139F2B50D78C8BA@SERVER10> Message-ID: <1333532345.90915.YahooMailNeo@web161404.mail.bf1.yahoo.com> Date: Wed, 4 Apr 2012 02:39:05 -0700 (PDT) From: Ashod Nakashian Reply-To: Ashod Nakashian Subject: Re: Compressed Pristines (Summary) To: Markus Schaber , "julianfoad@btopenworld.com" , "mtherieau@gmail.com" Cc: Subversion Development In-Reply-To: <727D8E16AE957149B447FE368139F2B50D78C8BA@SERVER10> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Combined response inline...=0A=0A> From: Markus Schaber =0A>=0A>First, thanks for your great summary. I'll throw in just my= 2 cents below.=0A=0AThe pleasure is mine.=0A=0A> From: Markus Schaber =0A>=0A>Was any of those tests actually executed on = a file system supporting something like "block suballocation", "tail mergin= g" or "tail packing"?=0A=0ANo, not to my knowledge. Mine was on standard in= stallations of Ubuntu 11.10. And I was trying to calculate the waste on a s= ystem that *didn't* have them enabled.=0A=0A=0A> From: Markus Schaber =0A>=0A>Today, I was rather surprised that my pristin= e subdir of one of our main projects which contains 726 MB of data has an a= ctual disk size of 759 MB, which leads to an overhead of less than 4% due t= o block-size rounding. (According to the Explorer "Properties" dialog of Wi= n 7 on a NTFS file system.)=0A=0ADid you have NTFS compression enabled?=0A= =0A=0A> From: Markus Schaber =0A>=0A>AFAICS, "mo= dern" file systems increasingly support that kind of feature[1], so we shou= ld at least think about how much effort we want to throw at the "packing" p= art of the problem if it's likely to vanish (or, at least, being drasticall= y reduced) in the future. =0A=0A[snip]=0A=0A> From: Mark Therieau =0A>=0A>Another thought would be to pursue a FUSE-like approach= similar to scord [1][2]=0A[snip]=0A=0A> From: Julian Foad =0A>=0A>1.=A0 Filesystem compression.=0A>=0A>Would you like to = assess the feasibility of compressing the pristine store by re-mounting the= "pristines" subdirectory as a compressed subtree in the operating system's= file system?=0A=0ANo :-)=0A=0AThere are two ways to answer this interestin= g proposition of compressed file-systems. The obvious one is that it isn't = something SVN can or should control. The file-system and certainly system d= rivers are up to the user and any requirement or suggestion of tempering wi= th them is decidedly unwarranted and unexpected from a VCS.=0A=0AThe second= is more relevant, however. The user may *still* enable/use these schemes w= ith or without compressed pristine support. After all, we are only concerne= d with the pristine store and *not* the working copy. So there is still roo= m for these technologies, if/when the user feels so inclined to utilize the= m.=0A=0ASo I'd say there is nothing preventing the user from using them, at= their responsibility, and get further gains in disk savings while at the s= ame time they are markedly out of scope for compressed pristines feature, i= f not SVN as a system.=0A=0A> From: Markus Schaber =0A>=0A>Additionally, the simple and efficient way of storing the pristi= nes in a SQLite database (one blob per file) also prevents us from exploiti= ng inter-file redundancies during compression, while adding a packing layer= on top of sqlite leads to both high complexity and a large average blob si= ze, and large blobs are probably more efficiently handled by the FS directl= y.=0A=0AYes. That's what the proposal I drafted is claiming.=0A=0A=0A> From= : Markus Schaber =0A>=0A>To cut it short: I'll "= take" whatever solution emerges, but my gut feeling tells me that we should= use plain files as containers, instead of using sqlite.=0A>=0A>The other a= spects (grouping similar files into the same container before compression, = applying a size limit for containers, and storing uncompressible files in u= ncompressed containers) are fine as discussed.=0A>=0A>I'll try to run some = statistics using publicly available projects on an NTFS file system, just f= or comparision.=0A>=0A=0A=0AThat would be great. Please share your finds.= =0A=0A=0A> From: Mark Therieau =0A>=0A>If the full goa= l is to reduce=A0pressure on the underlying file system in the presence=0A>= of many large working copies (e.g. one per branch) then=A0duplicate pristin= e contents,=0A>even with super-awesome compression would not match the spac= e savings of a=0A>de-duplicated, pristine-aware, copy-on-write=A0file syste= m.=0A=0AThat's assuming there are many duplicates. This is certainly possib= le, especially with many branches/tags checked out from the same source. Bu= t I suspect it's a more common scenario to have a single branch checked out= from different repositories. In other words, unless we have solid numbers = that there is more savings by de-duplication, the working assumption is tha= t improving a single branch by compression will be more useful to more user= s. Plus, your suggestion is probably part of the unified pristine store (ak= a ~/.svn) which is out of scope for compressed pristines.=0A=0A> From: Juli= an Foad =0A>=0A>The pristine store implementati= on also needs to provide =0A*uncompressed* copies of the files.=A0 Some of = the API consumers can and =0Ashould read the data through svn_stream_t; thi= s is the easy part.=A0 Other API consumers -- primarily those that invoke a= n external 'diff' tool -- need to be given access to a complete uncompresse= d file on disk.=0A=0AThis is certainly a -minor- complication we'll have to= deal with. It's just a technicality, not a show stopper or a problem per-s= e. The pristine/tmp folder could be cleaned up via svn cleanup, for example= , or at different check-points. The worse case scenarios are to either to c= lutter the disk by too many temp uncompressed pristines or to delete them p= rematurely and force the user to re-run their last command. These aren't fa= tal and it's easy to find a middle-ground to handle them.=0A=0A-Ash