www-builds mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Stein <gst...@gmail.com>
Subject Re: Turning Off Bb-fbsd2
Date Fri, 10 Feb 2017 00:32:42 GMT
On Thu, Feb 9, 2017 at 5:53 PM, Allen Wittenauer <aw@effectivemachines.com>
wrote:
>...

>         The Mac OS X host was shut down literally a day after I sent out
> an email to common-dev@hadoop announcing I had full build and patch
> testing working.  I had spent quite a bit of time getting Apache Yetus
> ported over to work on Apache's OS X machine, then spent over a month on
> working out the Hadoop specifics, running build after build after build.
> Competing with the Apache Mesos jobs that also ran on that box. The reason
> I was told it was killed was: "no one was using it".  (Umm, what?  Clearly
> no one bothered looking at the build log.)
>

This occurred before I started working as the Infrastructure Administrator
(last Fall). I don't know the full background, other than a PMC requested
that buildbot, then never used it. Yeah: maybe the build logs weren't
examined to see that other projects had hopped onto it.

I also believe we had to pay for that box, and it wasn't cheap.

Today, our preferred model for non-Ubuntu boxes is to have other people
own/run/manage those buildbots and hook them into our buildmaster. For
example, people on the Apache Subversion project have several such 'bots.

We are concentrating our in-house experience on the Ubuntu platform, from
both an operational and a cost angle. Four years ago, the Infra team had
many fewer projects to support. Today, we have hundreds of projects and
many thousands of committers to support. We've had to reallocate in order
to meet the incredible growth of the ASF.

Unfortunately, especially for yourself and some others, the "smoothing down
the edges" has been detrimental.

        In parallel, I started working on the Solaris box.... which was
> then promptly shutdown not too long after I had filed a jira to see if we
> could get the base CA certificates upgraded. (which was pretty much all I
> needed, after that I could have finished getting the Hadoop builds working
> on it as well).
>

We're still shutting down Solaris. Only one guy has experience with it, and
he's also got a ton of other stuff to do.

Our hardware that runs Solaris is also *very* old. Worse: we could never
get a support contract for it. They wouldn't sell us one (messed up, but
there it is). We really need to get that box fully shut down, unracked, and
thrown out.

        These were huge blows to Apache Hadoop, as one of the common
> complaints amongst committers is the lack of resources to do cross platform
> testing. Given the ASF had that infrastructure in place, being in this
> position was kind of dumb of the project.  Now the machines are gone and as
> a result, the portability of the code is still very hit or miss and the ASF
> is worse for it.
>

Apache Hadoop is worse for it. As Gavin has noted, just in the past year,
we've increased our build farm dramatically. I believe the ASF is better
for it. We also have a team better focused to support the growth of the ASF.

We can all agree that turning off services sucks for some projects and
people. But our growth has made demands upon the Foundation and its Infra
team that have forced our hand. We also have a funding model that just
doesn't support us hiring a team large enough to retain the disparate array
of services that we offered in the past.


>         Since that time, I've helped get the PowerPC build up and running,
> but that's been about it... and even then, I spend little-to-no time on the
> ASF-side of the build bits for those projects I'm interested simply because
> I have no idea if I'll be wasting my time because "whoops, we've changed
> direction again".


Again, we'll happily link any buildbot into our buildmaster, so you can
automate builds on your special bots. As you can see from above, we won't
be doing PowerPC. Just Ubuntu for all machines and services from now on.
This allows us (via Puppet) to easily reallocate, move, upgrade, and
maintain our services. Years ago, each machine was manually configured, and
when it went down, the Foundation suffered. Today, if a machine goes down,
we can spin it back up in an hour or two due to the consistency.

I do sympathize that our service reduction is painful. But I hope you can
understand where the Foundation (and its Infra team) is coming from. We
have vastly more projects to support today, meaning more uniformity is
required.

Sincerely,
Greg Stein,
Infrastructure Administrator, ASF

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message