From users-return-5467-daniel=haxx.se@subversion.apache.org Mon Oct 18 15:48:12 2010 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on giant.haxx.se X-Spam-Level: X-Spam-Status: No, score=-1.5 required=3.0 tests=BAYES_00,FREEMAIL_FROM, T_DKIM_INVALID,T_RP_MATCHES_RCVD,T_TO_NO_BRKTS_FREEMAIL autolearn=ham version=3.3.1 Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by giant.haxx.se (8.14.3/8.14.3/Debian-9.1) with SMTP id o9IDmBNM004259 for ; Mon, 18 Oct 2010 15:48:11 +0200 Received: (qmail 86016 invoked by uid 500); 18 Oct 2010 13:48:01 -0000 Mailing-List: contact users-help@subversion.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list users@subversion.apache.org Received: (qmail 86009 invoked by uid 99); 18 Oct 2010 13:48:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Oct 2010 13:48:01 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL Received-SPF: pass (athena.apache.org: domain of qazwart@gmail.com designates 209.85.216.43 as permitted sender) Received: from [209.85.216.43] (HELO mail-qw0-f43.google.com) (209.85.216.43) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Oct 2010 13:47:55 +0000 Received: by qwh6 with SMTP id 6so561510qwh.16 for ; Mon, 18 Oct 2010 06:47:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=lSdikqC5Rgo0PmYyDVRidxUdS8Ssb+xenkaUcNQ+v78=; b=oHDALZHGDQOFYqPY9qR4YUO4KzBDKVd4QFMEggUX0Qgo6/oWKUg3bd/1nTgq/jz7ei KK2l/n5nwd+B8AIUMN4ssegPvjy37Ssn0mnai6GG1yNErdLVvoH5Cj6pb0226bxfLGyc G4gmP2R0AOAFoovAG9CxYzxOXes3Jk09N6RLY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=Dof+clpbdh22wwTLmNUGMuUyH3WeeuHZcjGDjPA92as+lXRiCaZoiz03lahJ2Orq/s QpBj7X9avNIPZyolIisVVSOM7Ps2ZDTzTiMsB4bFL7CDATV642m5EUUoVd+d6HSQlalb YWo5iWlIUm41RJgxr5hmDGnaMwkUI4joWFo0Q= MIME-Version: 1.0 Received: by 10.224.76.9 with SMTP id a9mr2492448qak.148.1287409652757; Mon, 18 Oct 2010 06:47:32 -0700 (PDT) Received: by 10.229.231.82 with HTTP; Mon, 18 Oct 2010 06:47:32 -0700 (PDT) In-Reply-To: References: Date: Mon, 18 Oct 2010 09:47:32 -0400 Message-ID: Subject: Re: Subversion equivalent to VSS diff for binary files From: David Weintraub To: PINKERP1@nationwide.com Cc: users@subversion.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.3.5 (giant.haxx.se [80.67.6.50]); Mon, 18 Oct 2010 15:48:12 +0200 (CEST) X-Friend: Nope Subversion does handle binary files without any problems. In uses the property svn:mime-type on the file to mark it as a binary file, so it knows not to attempt a text merge on the file. Subversion does a good job with handling binaries. However, there is issue that makes storing binary files in Subversion problematic. Subversion really doesn't have an easy way to remove individual revisions of particular files. Normally, with text files, this isn't an issue. Text files are stored as diffs and removing a particular revision of a text file won't save a lot of room in the repository. Most people don't bother removing text revisions unless the text revision contains inappropriate or proprietary information that you don't want kept around. However, binary files are a bit different. Changing one line in a file and then compiling it may cause a cascade of changes, so the resulting difference between the previous revision of the binary and current version of the binary are quite huge. Storing a binary file as revisions in ANY revision control system takes up a great deal of space when compared with storing revisions of text. In many sites, the built objects are stored under revision control, maybe for every single build. You do this after a while, and you'll chew up a lot of disk space. To handle this, many sites have a way of identifying obsolete binary revisions and destroying them. I remember several papers in Perforce conferences on this very topic. (The idea was to remove the space hogging binary revision without destroying the revision itself. That way, you'd still see the history, but not have access to the file contents). So, the best thing to do is not to store binary files when you don't have to. Storing binaries is done for several reasons some are: 1. Not being sure that you can repeat your build process, so you want to keep the binary revision "just in case". The solution to this is to create a repeatable build process, so you don't have to store the binaries. 2. Storing releases. A very common tactic, but revision control systems aren't really ideal for this anyway. Most people who need to grab the releases aren't necessarily developers, so using a revision control system to get the release they want simply adds complexity. A better way is to use a release repository system. 3. Storing third party artifacts. This is usually not a space issue since it is unlikely you'll be storing a hundred revisions of a particular third party binary. You might, maybe update a third party binary one or twice a year. The problem with this (which is a problem with every revision control system out there) is that you quickly lose the true identify of the third party revision. This happens all the time with Jar files. Is that log4j.jar revision 1.2.3 or 1.4.6? How do I know? In the end, you'll end up with a pile of unidentifiable and probably obsolete third party binaries. Considering that the whole purpose of revision control is identifying what is in your software, having a wad of unknown third party binaries isn't a great way to accomplish this task. The true solution to this is to use a release repository system. If you use Ant, it's quite simple to incorporate ivy, and once done, the developers are usually quite happy with the results. Even if you aren't working on a Java project, you should use some sort of release repository for this type of stuff. Should you ever store binary files in Subversion? Of course, but only when it is really the best way to handle the problem. If your source code includes JPGs and GIFs, or you include a Word document in your release, there isn't really an alternative, but to store the binary file. Space isn't an issue since these files are relatively small and aren't updated that frequently. (Compare a few megabytes of a Word document that you update once per month vs. storing a 1.5 gigabyte build that you produce two to three times per day). So, there are two sides to binaries in Subversion story: Yes, Subversion handles binaries just as well as other revision systems. Some say even better. Subversion knows what files are binary by using the svn:mime-type property. In fact, Subversion can, unlike many version control systems, actually distinguish between binary types, and it is possible via third party tools to actually diff binary files (like between two Word documents). No, Subversion doesn't allow easy pruning of space hogging binaries, and therefore it can cause problems in that respect. If you're using a revision control system, and now have a policy of removing obsolete binaries on a regular basis, you'll have problems continuing this with Subversion. However, if this is a problem, it's more likely due to incorrectly using your revision control system (unable to rebuild order binaries in a consistent manor, or using your version control system as a release repository). The solution would be to fix the underlying problem rather than not to use Subversion. -- David Weintraub qazwart@gmail.com