Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 52ED0200BC7 for ; Fri, 11 Nov 2016 06:45:24 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 511B7160B10; Fri, 11 Nov 2016 05:45:24 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6FCB3160B01 for ; Fri, 11 Nov 2016 06:45:23 +0100 (CET) Received: (qmail 65877 invoked by uid 500); 11 Nov 2016 05:45:22 -0000 Mailing-List: contact dev-help@netbeans.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@netbeans.incubator.apache.org Delivered-To: mailing list dev@netbeans.incubator.apache.org Delivered-To: moderator for dev@netbeans.incubator.apache.org Received: (qmail 73536 invoked by uid 99); 8 Nov 2016 18:58:28 -0000 X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.699 X-Spam-Level: * X-Spam-Status: No, score=1.699 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to; bh=6E6SzaOSsacEEMToFBo0IzIfzG49WcSNn/Qxc/VDII4=; b=AzAwV2NCOFdJqK0w/KQf6sisPHETZ5kiwk2JX85mxqAzSBVvt2IMF69Q78fTJJbtBa aLcc8tCRfDuuLHhHhzMMMC7XoMOqM2iQ6BljXFEwgAjUHYZJfAuosWKK2xeKRzeAMAwt VKEKS2RiVi28Nh123+Com6P23FqTYisw1JBQWPPu6OIjyqH4W0Hxm8vZEBiywkWxZX5J 6RFXAakKauH21C2bg0C3Nyp1czb2Mmgx74eBkNngyqe5FJ2WXD7d4veFYbim5e8S4gr+ chBdhdBvpueI4M9401cZvRWkh2GpM4CUHSTL+66r8bm22MVYFqHP+ZwM00P0ez0UwU82 ikGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=6E6SzaOSsacEEMToFBo0IzIfzG49WcSNn/Qxc/VDII4=; b=YRbIuAyKWuZQluMPFYTULOomnPJ0TcYSGOeXZp6CCc4qAOhiP0MUMKhN+MqUryr6wT ypkYyVAn8jb7SBB3d7aZlOd+aS1LOkelLgftWkSF0t/b6lO7l6dnQeO8BQb/uS0LbSc9 e3KvIMy2hirqyR5X02qB53R+bInDJuzz0XEptgjoTOHGIiMPhuWjWPSGsVixJaWhQuCe PWyrzyOvhMElrH0tTW+HZJd6AT1YI2eqZMlN9Ay8mGUPVNuE1Oq2hZ7LhQ4R2vyxbHN2 14RL4gChfnOTINaYb4EVnvYIq9jAXERAkkdds5j/UdXG+n6GdzhHbICUeaVAXOYz/ehq lvsQ== X-Gm-Message-State: ABUngve8Y/j09A6cw/4ZSDCYyZo8iof3WJA9Y8pGCKGY04xUIHEpqQv55zWlTmtVoYxsdtgiyRscnpgEh85XjQ== X-Received: by 10.159.39.104 with SMTP id a95mr7995490uaa.104.1478631504186; Tue, 08 Nov 2016 10:58:24 -0800 (PST) MIME-Version: 1.0 From: Gregory Szorc Date: Tue, 8 Nov 2016 10:58:23 -0800 Message-ID: Subject: Version control advice To: dev@netbeans.apache.org Content-Type: multipart/alternative; boundary=94eb2c1240908f96260540ceb972 archived-at: Fri, 11 Nov 2016 05:45:24 -0000 --94eb2c1240908f96260540ceb972 Content-Type: text/plain; charset=UTF-8 I'm a Mercurial developer who is also responsible for running https://hg.mozilla.org/ and supporting Mercurial at Mozilla. I understand NetBeans is contemplating its version control future because the ASF only supports Subversion and Git. I think I've learned some things that may be helpful to you. First, the NetBeans "main" repo is on the same order of magnitude (but marginally smaller than) the Firefox repository in terms of file count and repository data size. So generally speaking, what I have learned supporting Firefox can apply to NetBeans. While I understand Mercurial may not be in your future, I'd like to point out that hg.netbeans.org is running a very old and very slow version of Mercurial (likely a release from before July 2010). The high volume of merge commits in the "main" repo contributes to highly sub-optimal storage utilization in old versions of Mercurial. This makes clones and pulls significantly slower due to more data to transfer and contributes to significant CPU load on the server to read/encode the sub-optimal storage encoding. I wouldn't be surprised if you have CPU load issues on the server. As it is stored today, the "main" repository is almost exactly 3 GB. If you create a new repository with optimal storage encoding using Mercurial 3.7 or newer so "generaldelta" is the default storage format and configuring the repository to recalculate optimal deltas, the repository size drops to ~1.1 GB. This can be done as such: $ hg init main-optimal $ cd main-optimal $ hg --config format.generaldelta=true --config format.aggressivemergedeltas=true pull https://hg.netbeans.org/main Now, for your VCS future. I'm a huge proponent of monorepos for productivity reasons. I've seen discussion on this list about splitting the repo. I would discourage that. I'd encourage you to read https://danluu.com/monorepo/ and the linked articles at the bottom for more on the topic. Unfortunately, one of the practical concerns about monorepos is they don't scale with some version control tools, namely Git. This leads many to let deficiencies in tools drive workflow decisions, which is quite unfortunate because tools should enhance productivity, not hinder it. If NetBeans uses Git and maintains the "main" repo as is, I believe you'll experience the following performance issues now or in the future as the repository keeps growing: * You'll constantly be dealing with CPU explosions on the Git server generated from clients performing clones and large pulls. GitHub uses a server infrastructure that caches certain operations related to packfiles to help mitigate this. I'm not sure the state of ASF's Git server. * In many cases, shallow clones can require more CPU on the Git server to process than full clones. This is because the server essentially has to read objects from packs and repack things instead of doing a fastpath that effectively streams a packfile to a client. * Garbage collection could be problematic on the server and client Now, Git is constantly improving, so these problems may not always exist.And as much as GitHub does well scaling well - better than a vanilla Git install - it isn't a silver bullet. On a few instances, processes at Mozilla have overwhelmed GitHub and resulted in GitHub disabling access to repositories! That hasn't happened in a while though (partially through them scaling better and partially through us learning our lesson and not pointing hundreds of machines at large Git repos). I'm not sure what if anything ASF's Git server has done to mitigate load from large repositories. It's worth nothing that while some of the server-side CPU issues exist in default Mercurial installations, there are mitigations. The "clonebundles" extension allows a server to advertise pre-generated "bundle" files of repository content. When a client clones, they download a large bundle from a static file server then go back to the Mercurial server and get the data changed since the bundle was created. If you `hg clone https://hg.mozilla.org/mozilla-unified` with a modern Mercurial client, your client will grab a 1+ GB file from a CDN and our servers will spend maybe 5s of total CPU to service the clone. The clones are faster for clients and the server can scale clones to nearly infinitely. It is wins all around. Anyway, Mercurial's ability to scale doesn't help you if your choices are Subversion or Git :/ Given those choices, I would lean towards Subversion if you want to maintain the "main" repo as is. If you use the "main" repo as is with Git, you should really do due diligence with the Git server operator to make sure they won't be overwhelmed. If you split the "main" repo, go with Git if your users prefer Git over Subversion. A compromise option would be to keep everything in a monorepo in Subversion and have separate Git repositories for specific subdirectories or "views." This is often a win-win but requires a bit of tooling to do the syncing. Speaking of syncing, it should be unidirectional: bi-directional syncing of anything is a hard problem and take my word from someone who has hacked on bi-directional VCS syncing that it is not something you want to support. Instead, I recommend abstracting the process of "pushing to the canonical repo" to something a machine does and have it perform the VCS conversion to the canonical repo and do the actual push. e.g. landing something from Git would have a server fetch that Git ref and replay the commits as Subversion commits (or squash and commit to preserve atomicity). Anyway, I think this wall of text is long enough. Reply if you have any questions. Gregory --94eb2c1240908f96260540ceb972--