hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: LimitedPrivate and HBase (thoughts from the build and test world)
Date Thu, 09 Jun 2011 19:23:27 GMT
Nice reality check and thanks for the how it was addressed elsewhere Steve.

As you say, it sounds like a large undertaking but it would be a sweet
service for the downstreamers.

St.Ack

On Thu, Jun 9, 2011 at 4:42 AM, Steve Loughran <stevel@apache.org> wrote:
> On 06/08/2011 06:41 PM, Suresh Srinivas wrote:
>>
>> I do not see any issue with the change that Todd has made. We have done
>> similar changes in HDFS-1586 in the past.
>>
>> Making APIs public comes with a cost. That is what we are avoiding with
>> LimitedPrivate. The intention was to include the following projects that
>> are
>> closely tied to Hadoop as projects eligible for LimitedPrivate.
>> {"HBase", "HDFS", "Hive", "MapReduce", "Pig"}. This list could grow in the
>> future.
>
> I'm going to talk about my experience on the Ant team.
>
> One of the lessons of that project is that in the open source world, you
> can't predict how your code gets used, or control it. If someone wants to
> take your app and use it as a library -they can. If someone wants to do
> something completely unexpected with that library -they can. And this is a
> good thing, because your code gets used. Yes, you get new bugreps, but every
> person using your code is someone not using somebody elses code. You win.
>
> The other lesson from that is the following: in open source, there is no
> such thing as private code.
>
> * If you mark something as package scoped, they just inject their classes
> into your package (and who hasn't done that with their Hadoop extensions?).
> * If you mark something as protected, they subclass and open up its privacy.
> * If you mark something as private, they edit your source and create a new
> JAR with the relaxed permission
>
> for any of these actions, you end up fielding the bugreps, as the stack
> trace points to you. And it increases maintenance costs for everyone.
>
>
> Alternatively they cut and paste your code into their codebase, possibly
> -but not always- retaining the apache credits.
>
> That
>  * complicates copyright and lawsuits:
>  http://www.theserverside.com/news/thread.tss?thread_id=29958
>
>  * increases maintenance costs for everyone, especially if there are
> security issues with the original code.
>
>> When such projects break because of API change, we can co-ordinate as
>> community and fix the issues. This is not true for some application that
>> we
>> do not know of breaks!
>
> The way Ant handled this with Gump, the nightly clean build of all the OSS
> Java projects built with Ant
> http://vmgump.apache.org/gump/public/
>
> For all the projects, they thought they were getting a free CI build run,
> but what it really was was a regression test of Ant and every single OSS
> project. If a change in Ant broke anyone's build: we noticed. If a change in
> Log4J broke a build, someone noticed. It became a rapid-response regression
> test for the entire OSS suite.
>
> Sadly, it doesn't work so well. I'd blame Maven, but the move to ivy
> dependencies doesn't help either, it complicates classpaths no end.
>
> Even so, the idea is great: build and test your downstream applications, and
> the things you depend on, so you find problems within 24 hours of the change
> being committed -regardless of which project committed the change.
>
> The way to do it now would be with Jenkins, not just building and testing
> Hadooop-{core, hdfs, mapreduce}, but
>  -building and publishing every upstream dependency.
>  -test against the trunk versions build locally.
>  -build and test against the ivy-versioned artifacts that are controlled by
> the version.properties
>
> Together this flags up when something works against the old artifacts, but
> doesn't work against the trunk versions: that's their regressions, caught
> early.
>
> Downstream
>  -build and test the OSS projects that work with Hadoop.
>  That's the apache ones: HBase, Mahout, Pig, Hive, Hama etc, and the other
> ones, such as Cascading.
>
> That can be offered as a service to these projects "we will build and test
> your code against our trunk", a service designed to benefit everyone. They
> find their bugs, we find regressions.
>
> This is a pretty complex project, especially when you think about the
> challenge of testing your RPM generation code will install the RPMs (I bring
> up clean CentOS VMs for such a purpose), but without it you don't get
> everything working together, which is the state things appear to be in
> today.
>
> Ignoring the RPM install & test problems, if people are interested in
> working on this, we should be able to do a lot of it on Jenkins. Who is
> willing to get involved?
>
> -Steve
>

Mime
View raw message