hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roman Shaposhnik <ro...@shaposhnik.org>
Subject Re: libhdfs3 development is still going on outside of ASF
Date Fri, 16 Sep 2016 05:17:23 GMT
On Wed, Sep 14, 2016 at 11:19 PM, Zhanwei Wang <wangzw@apache.org> wrote:
>> Open source is about community first.
>
> Good point Kyle. I strongly agree with you!
>
> But unfortunately seems no one in this thread care about libhdfs3’s community (users)
except me.

Quite the contrary -- I think we all do. In fact, part of the reason
of really making
sure that it gets maintained as part of Apache HAWQ (incubating) is to make
sure that there's long term viability of the project.

> Positively ignore the frustration of libhdfs3 users and about to delete it’s repository.

I don't think the frustration is related to whether we delete it or not, I think
the frustration is related to the fact the current model of libhdfs3 living in a
random, separate GH repo:
   1. does NOT have a clear governance model: the bigger ASF community doesn't
   really monitor pull request, there's not clear way of filing issues
against it, etc.

   2. does NOT have a clear release policy: last release appears to be
Dec 17, 2015
   and even that doesn't clearly indicate what was the release criteria for it.

   3. does NOT have a clear path of integration with HAWQ.

> So let’s set the tone of this thread.
>
>  If we remove libhdfs3’s repository or make it read only:
>   a. What benefit we can get for BOTH HAWQ and libhdfs3’s users?
>   b. What drawback for BOTH HAWQ and libhdfs3’s users?
>
> The following is my answer.
>
> a. Benefit: For HAWQ, seems ASF govern its property with ASF rules.  For libhdfs3’s
users, none.

Once again -- I disagree. For libhdfs3 users the benefit is a
predictable process
of how to contribute, how to consume releases without fearing problems around
intellectual property issues (who's making sure that the code in that random GH
repo is clean?) and stability.

> b. Drawback: For HAWQ, not relevant commits will come into HAWQ’s commit log.

At this point HAWQ consists of many parts. Not all of them are of
interest to all
people (e.g. PXF is quite separate) but all of them are needed to make HAWQ
awesome. IOW, HAWQ developers absolutely should care about libhdfs3 commits.
The whole purpose of libhdfs3 is to be the best, darn HDFS interface
library *for*
HAWQ, not a generic implementation.

> JIRA and pull request will be fired in HAWQ but not related to HAWQ.  Furthermore
> commit in libhdfs3 may break HAWQ and it’s hard to debug, I have experienced it enough.
> It is important to use the stable version of libhdfs3, HAWQ code should only keep the
stable
> version of libhdfs3.

Well, suffice it to say that I pretty strongly disagree with all of
the above points.

>     For libhdfs3’s user, they have to ask question in HAWQ’s community.

Yes, because that's the most appropriate community today for these questions
to be asked. Remember that HDFS community is busy building alternative for
libhdfs3 (or at least polishing the existing one). Its not like HAWQ’s libhdfs3
is the only choice to interface with HDFS from C and C++.

Also, remember, that HAWQ is the driving force behind libhdfs3. Suppose
for example that HAWQ community makes a choice to stick with a particular
version of Hadoop as the default choice for where HAWQ runs the best -- well,
then, libhdfs3 will stick to that choice as well. Regardless of
whether the non-HAWQ
users feel, for example, that Hadoop 3 is a priority.

> They have to clone entire HAWQ to build libhdfs3 and contribute.

Well, if HAWQ makes a proper ASF release and releases the libarary separately
they won't have to. In fact, I'd say that downstream consumers should never
clone the repo -- they should *always* use a released version.

> Let’s think about more. How we schedule a release of libhdfs3 when HAWQ is under developing?

Releases in ASF are pretty cheap once you get over the initial hurdles of
IP issues. Which, see my point above, is exactly what worries me about
what we are giving to our users in that random GH repo.

> Should we branch HAWQ for libhdfs3’s release?

No. You just do a release.

> Should we merge libhdfs3’s pull request when we
> are releasing HAWQ?

Yes, absolutely -- all releases are always done on a release branch
and meanwhile pull requests could land in trunk (and then an RM
can decide to pull certain commints into a release branch).

> Do we have to sync the release process of HAWQ and libhdfs3 and how?

Simple: you just do a release.

> Maybe we should better involve libhdfs3’s users into this thread. But unfortunately
they are
> not in HAWQ’s mail list.

So where are they? Are they given the tools to collaborate? Do they
have a mailing
list? Do they have a website? Do they have an issue tracker?

> In general merge two independent project together introduce more trouble than benefit.

They are NOT independent today.

> To be clear, I’m not against ASF rule. I’m deeply understand the importance of it.
> Is there any way to make HAWQ and libhdfs3 separated and make both ASF and
> libhdfs3’s user happy?

There's no ASF users vs libhdfs3 users. There's project users and
today that project
is called HAWQ.

> Just like Kyle said, “HOW” is more important.
>
> @Roman, your mentoring is important.
>
>
> Any comments?

I think I provided plenty. Hope this helps.

Thanks,
Roman.

Mime
View raw message