hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milind Bhandarkar <mbhandar...@linkedin.com>
Subject Re: LimitedPrivate and HBase (thoughts from the build and test world)
Date Fri, 10 Jun 2011 02:21:37 GMT
[Just wondering if one of the criteria for graduating to a top-level
project should be "no dependency on the LimitedPrivate APIs of the parent


I agree with your suggestion for a downstream-project-build-and-test

All I can say is, "stay tuned".

- milind

Milind Bhandarkar

On 6/9/11 4:42 AM, "Steve Loughran" <stevel@apache.org> wrote:

>On 06/08/2011 06:41 PM, Suresh Srinivas wrote:
>> I do not see any issue with the change that Todd has made. We have done
>> similar changes in HDFS-1586 in the past.
>> Making APIs public comes with a cost. That is what we are avoiding with
>> LimitedPrivate. The intention was to include the following projects
>>that are
>> closely tied to Hadoop as projects eligible for LimitedPrivate.
>> {"HBase", "HDFS", "Hive", "MapReduce", "Pig"}. This list could grow in
>> future.
>I'm going to talk about my experience on the Ant team.
>One of the lessons of that project is that in the open source world, you
>can't predict how your code gets used, or control it. If someone wants
>to take your app and use it as a library -they can. If someone wants to
>do something completely unexpected with that library -they can. And this
>is a good thing, because your code gets used. Yes, you get new bugreps,
>but every person using your code is someone not using somebody elses
>code. You win.
>The other lesson from that is the following: in open source, there is no
>such thing as private code.
>* If you mark something as package scoped, they just inject their
>classes into your package (and who hasn't done that with their Hadoop
>* If you mark something as protected, they subclass and open up its
>* If you mark something as private, they edit your source and create a
>new JAR with the relaxed permission
>for any of these actions, you end up fielding the bugreps, as the stack
>trace points to you. And it increases maintenance costs for everyone.
>Alternatively they cut and paste your code into their codebase, possibly
>-but not always- retaining the apache credits.
>  * complicates copyright and lawsuits:
>  http://www.theserverside.com/news/thread.tss?thread_id=29958
>  * increases maintenance costs for everyone, especially if there are
>security issues with the original code.
>> When such projects break because of API change, we can co-ordinate as
>> community and fix the issues. This is not true for some application
>>that we
>> do not know of breaks!
>The way Ant handled this with Gump, the nightly clean build of all the
>OSS Java projects built with Ant
>For all the projects, they thought they were getting a free CI build
>run, but what it really was was a regression test of Ant and every
>single OSS project. If a change in Ant broke anyone's build: we noticed.
>If a change in Log4J broke a build, someone noticed. It became a
>rapid-response regression test for the entire OSS suite.
>Sadly, it doesn't work so well. I'd blame Maven, but the move to ivy
>dependencies doesn't help either, it complicates classpaths no end.
>Even so, the idea is great: build and test your downstream applications,
>and the things you depend on, so you find problems within 24 hours of
>the change being committed -regardless of which project committed the
>The way to do it now would be with Jenkins, not just building and
>testing Hadooop-{core, hdfs, mapreduce}, but
>  -building and publishing every upstream dependency.
>  -test against the trunk versions build locally.
>  -build and test against the ivy-versioned artifacts that are
>controlled by the version.properties
>Together this flags up when something works against the old artifacts,
>but doesn't work against the trunk versions: that's their regressions,
>caught early.
>  -build and test the OSS projects that work with Hadoop.
>  That's the apache ones: HBase, Mahout, Pig, Hive, Hama etc, and the
>other ones, such as Cascading.
>That can be offered as a service to these projects "we will build and
>test your code against our trunk", a service designed to benefit
>everyone. They find their bugs, we find regressions.
>This is a pretty complex project, especially when you think about the
>challenge of testing your RPM generation code will install the RPMs (I
>bring up clean CentOS VMs for such a purpose), but without it you don't
>get everything working together, which is the state things appear to be
>in today.
>Ignoring the RPM install & test problems, if people are interested in
>working on this, we should be able to do a lot of it on Jenkins. Who is
>willing to get involved?

View raw message