incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruno Mahé <br...@cloudera.com>
Subject Re: [DISCUSS] BOM and supported platforms for Bigtop 0.4.0
Date Thu, 03 May 2012 23:30:25 GMT
Hi,

Please see my reply inline.

On 05/03/2012 04:00 PM, Owen O'Malley wrote:
> On Thu, May 3, 2012 at 12:01 PM, Bruno Mahé <bmahe@apache.org> wrote:
>>>
>>> As a mentor of the Bigtop project, I don't see it as acceptable for an
>>> Apache project to distribute binaries of non-Apache software. If the
>>> owners of the Hue project decide to donate it to Apache and it had
>>> been released by Apache, then it would be acceptable. I'm strictly -1
>>> on releasing any version of Bigtop with Hue or any other non-Apache
>>> software as part of the release.
>>>
>>> -- Owen
>>
>>
>> As part of mentoring Apache Bigtop (incubating) project, it would also
>> be greatly appreciated if you would explain why this -1.
>>
>> Apache Bigtop (incubating) does not and will not include anything that
>> does not belong to the Apache Foundation.
>> So I am really confused as to why this strong reaction.
> 
> The strong reaction is because Roman was proposing a Bigtop release
> with rpms and debs for non-Apache projects. That is a non-starter.
> Apache will not distribute non-Apache projects. Saying that Bigtop
> does not release the projects that it incorporates is not justified
> given the fact that Bigtop is putting rpms of each of the incorporated
> projects into /dist/incubator/bigtop. The 2.9GB size of the latest
> Bigtop release has already caused infrastructure significant
> headaches.
> 

Seems like you are still (this is not the first time this is explained
to you) conflating Apache releases on which members vote on, with
convenience artefacts.
Apache Bigtop (incubating) releases are not the packages. Apache Bigtop
(incubating) releases are Apache Bigtop (incubating) source code.
RPMs and DEBs are convenience artefact.
If they are not that convenient to the Apache Foundation, I don't see
the issue with not distributing the ones that are not convenient.

As far as I know, Apache Infra was only asking for heads up. Which we
will provide and we will pay attention to work more closely with them.
I also fail to see the relationship between the size of the convenience
artefacts and the bill of materials for the coming release of Apache
Bigtop (incubating), which I repeat only contains Apache Bigtop
(incubating) source code.

So now we have establish that you issue is about the convenience
artefacts, I don't see any remaining issue with Apache Bigtop
(incubating) releases.


>> The convenience artefact may pull Hue in, but this is in no way
>> different from Apache Hadoop pulling in Google protocol buffer or Google
>> guava. So again, how is this different? Is Apache Hadoop going to
>> avandon Google Protocolbuffer?
> 
> There is a big difference between referencing external projects that
> are required for your project's functionality and incorporating
> non-Apache projects into your project and publishing releases of them
> using independent artifacts. When the user installs a Hadoop rpm, the
> protobuf.jar is there under the hood, but is considered an
> implementation detail that is required for Hadoop to run.
> 
> I'd complain similarly if Hadoop was downloading protobuf tarballs,
> making changes to protobuf, making protobuf rpms with those changes,
> and publishing those rpms on Apache's servers.
> 

In any case, there is still distribution of a non-Apache project's
artefacts by both projects.
You either distribute artefacts of it, or you don't. Here the end goal
is not to provide packages, but a deployable big data stack. Packages
are just a mean to an end.
We don't distribute upstream projects, they are dependencies.


> However, it goes deeper than than that. If the user installs Bigtop's
> rpms and hits a bug do they contact Hue or Bigtop? Furthermore, I'm
> sure the links that are displayed when you run Bigtop's Hue point off
> to Cloudera's bug and support system. That kind of branding is not ok
> for an Apache project.
> 

Hue is not even integrated into Apache Bigtop (incubating). So let's
cross that bridge when we get there. And in any case, this is an issue
that can be fixed trivially, so I wouldn't make it a blocker.

But beyond that, we don't patch anything. So any product issue would
come from the product. Any integration issue would be an Apache Bigtop
(incubating) issue. The same way with Apache Hadoop.
No matter what you do, educating users will be the most important part.


> Even with Apache projects, Bigtop may become problematic. Look at the
> mess that happened when Lucene made a bugfix release of Apache Commons
> CSV. Lucene needed a bugfix release of CSV, didn't wait for CSV to
> release, and instead released it themselves. Needless to say Apache
> Commons didn't like that result. Bigtop is overriding decisions made
> by the upstream projects about things like the way the launching
> scripts operate and where the configuration directory is. When asked
> to correct it, they complain about compatibility for their users
> rather than compatibility for Hadoop's users.
> 

How is this related with Hue?
I am also not sure about the point of complaining about this here on
general@. This is a direction taken by the Apache Bigtop (incubating)
community at large. You may not agree, but there is a consensus on that
part. Unless there is something in the Apache Foundation bylaws that
would force an upstream project to force decisions on other communities?

And beyond that, this is not new. As it was explained in an other thread
to Matt on Apache Bigtop (incubating) mailing list:
Apache Bigtop (incubating) has made the choice from the very beginning
to be close to what GNU/Linux distributions have been doing and what
sysadmins have been used to. This can differ with the experience one may
have with a tarball development of Apache Hadoop.

Keep in mind also that each component has its own way of doing things.
Each one having its own issues.
Apache Bigtop (incubating) smooth this up and provide an easy and
unified experience to users. Furthermore, we cannot satisfy any whim of
every single upstream components.

In order to use pristine Apache Hadoop, one would have to be familiar
with its usage and its configuration. Apache Hadoop experience is only
familiar to people *already* familiar with Apache Hadoop.

So using upstream Apache Hadoop will imply a lot of reading through
forums, documentation and frustration.
For instance, Apache Bigtop (incubating) will pre-set the ulimits for
you, will set up the logging to well-known locations, provide init
scripts and make a pseudo-configuration available to users.

In a word, Apache Bigtop (incubating) has a different use case than
upstream Apache Hadoop and therefore will be different in some areas.
Or put it differently, upstream projects should focus on what they do
best, which is to work on delivering awesome projects, while in Apache
Bigtop (incubating) we focus on what we do best, which is making a
production quality/ready big data stack.

And I don't see the issue of caring about compatibility of Apache Bigtop
(incubating) users within the Apache Bigtop (incubating) project.

But again, all of this is unrelated to this thread. So you may want to
move that discussion to another thread.

Thanks,
Bruno

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message