incubator-netbeans-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emilian Bold <emilian.b...@gmail.com>
Subject Re: Version control advice
Date Fri, 11 Nov 2016 06:33:46 GMT
Thank you for following through with this after we talked on IRC.

I will check later the size reduction for the releases/ repo.

În Vin, 11 nov. 2016 la 07:45 Gregory Szorc <gregory.szorc@gmail.com> a
scris:

> I'm a Mercurial developer who is also responsible for running
> https://hg.mozilla.org/ and supporting Mercurial at Mozilla. I understand
> NetBeans is contemplating its version control future because the ASF only
> supports Subversion and Git. I think I've learned some things that may be
> helpful to you.
>
> First, the NetBeans "main" repo is on the same order of magnitude (but
> marginally smaller than) the Firefox repository in terms of file count and
> repository data size. So generally speaking, what I have learned supporting
> Firefox can apply to NetBeans.
>
> While I understand Mercurial may not be in your future, I'd like to point
> out that hg.netbeans.org is running a very old and very slow version of
> Mercurial (likely a release from before July 2010). The high volume of
> merge commits in the "main" repo contributes to highly sub-optimal storage
> utilization in old versions of Mercurial. This makes clones and pulls
> significantly slower due to more data to transfer and contributes to
> significant CPU load on the server to read/encode the sub-optimal storage
> encoding. I wouldn't be surprised if you have CPU load issues on the
> server.
>
> As it is stored today, the "main" repository is almost exactly 3 GB. If you
> create a new repository with optimal storage encoding using Mercurial 3.7
> or newer so "generaldelta" is the default storage format and configuring
> the repository to recalculate optimal deltas, the repository size drops to
> ~1.1 GB. This can be done as such:
>
>    $ hg init main-optimal
>    $ cd main-optimal
>    $ hg --config format.generaldelta=true --config
> format.aggressivemergedeltas=true pull https://hg.netbeans.org/main
>    <wait a long time>
>
> Now, for your VCS future.
>
> I'm a huge proponent of monorepos for productivity reasons. I've seen
> discussion on this list about splitting the repo. I would discourage that.
> I'd encourage you to read https://danluu.com/monorepo/ and the linked
> articles at the bottom for more on the topic.
>
> Unfortunately, one of the practical concerns about monorepos is they don't
> scale with some version control tools, namely Git. This leads many to let
> deficiencies in tools drive workflow decisions, which is quite unfortunate
> because tools should enhance productivity, not hinder it. If NetBeans uses
> Git and maintains the "main" repo as is, I believe you'll experience the
> following performance issues now or in the future as the repository keeps
> growing:
>
> * You'll constantly be dealing with CPU explosions on the Git server
> generated from clients performing clones and large pulls. GitHub uses a
> server infrastructure that caches certain operations related to packfiles
> to help mitigate this. I'm not sure the state of ASF's Git server.
>
> * In many cases, shallow clones can require more CPU on the Git server to
> process than full clones. This is because the server essentially has to
> read objects from packs and repack things instead of doing a fastpath that
> effectively streams a packfile to a client.
>
> * Garbage collection could be problematic on the server and client
>
> Now, Git is constantly improving, so these problems may not always
> exist.And as much as GitHub does well scaling well - better than a vanilla
> Git install - it isn't a silver bullet. On a few instances, processes at
> Mozilla have overwhelmed GitHub and resulted in GitHub disabling access to
> repositories! That hasn't happened in a while though (partially through
> them scaling better and partially through us learning our lesson and not
> pointing hundreds of machines at large Git repos). I'm not sure what if
> anything ASF's Git server has done to mitigate load from large
> repositories.
>
> It's worth nothing that while some of the server-side CPU issues exist in
> default Mercurial installations, there are mitigations. The "clonebundles"
> extension allows a server to advertise pre-generated "bundle" files of
> repository content. When a client clones, they download a large bundle from
> a static file server then go back to the Mercurial server and get the data
> changed since the bundle was created. If you `hg clone
> https://hg.mozilla.org/mozilla-unified`
> <https://hg.mozilla.org/mozilla-unified> with a modern Mercurial client,
> your client will grab a 1+ GB file from a CDN and our servers will spend
> maybe 5s of total CPU to service the clone. The clones are faster for
> clients and the server can scale clones to nearly infinitely. It is wins
> all around.
>
> Anyway, Mercurial's ability to scale doesn't help you if your choices are
> Subversion or Git :/
>
> Given those choices, I would lean towards Subversion if you want to
> maintain the "main" repo as is. If you use the "main" repo as is with Git,
> you should really do due diligence with the Git server operator to make
> sure they won't be overwhelmed.
>
> If you split the "main" repo, go with Git if your users prefer Git over
> Subversion.
>
> A compromise option would be to keep everything in a monorepo in Subversion
> and have separate Git repositories for specific subdirectories or "views."
> This is often a win-win but requires a bit of tooling to do the syncing.
> Speaking of syncing, it should be unidirectional: bi-directional syncing of
> anything is a hard problem and take my word from someone who has hacked on
> bi-directional VCS syncing that it is not something you want to support.
> Instead, I recommend abstracting the process of "pushing to the canonical
> repo" to something a machine does and have it perform the VCS conversion to
> the canonical repo and do the actual push. e.g. landing something from Git
> would have a server fetch that Git ref and replay the commits as Subversion
> commits (or squash and commit to preserve atomicity).
>
> Anyway, I think this wall of text is long enough. Reply if you have any
> questions.
>
> Gregory
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message