Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E2841200B85 for ; Thu, 15 Sep 2016 14:16:43 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E123A160AB7; Thu, 15 Sep 2016 12:16:43 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D6AA2160AB5 for ; Thu, 15 Sep 2016 14:16:42 +0200 (CEST) Received: (qmail 13821 invoked by uid 500); 15 Sep 2016 12:16:42 -0000 Mailing-List: contact dev-help@hawq.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hawq.incubator.apache.org Delivered-To: mailing list dev@hawq.incubator.apache.org Received: (qmail 13808 invoked by uid 99); 15 Sep 2016 12:16:41 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Sep 2016 12:16:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 09EE8C0AC7 for ; Thu, 15 Sep 2016 12:16:41 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.576 X-Spam-Level: *** X-Spam-Status: No, score=3.576 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=2.397, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=continuum.io Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id uPVXl_iTXs0P for ; Thu, 15 Sep 2016 12:16:36 +0000 (UTC) Received: from mail-lf0-f47.google.com (mail-lf0-f47.google.com [209.85.215.47]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 52D2F5FBD7 for ; Thu, 15 Sep 2016 12:16:36 +0000 (UTC) Received: by mail-lf0-f47.google.com with SMTP id u14so32985705lfd.1 for ; Thu, 15 Sep 2016 05:16:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=continuum.io; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=MQ0z0wmOB/cxze4/LM1mNbTLLMfhcYnA316yUKb9nVQ=; b=Fb+v36ZqR9ozL3dx0t3MD1U9dLyWdZisy0LomKBm6fGOblbpP08d9X1N/lJ0ASV4um e/KdzCNpRHqH+uvl8jzxtKfadm816QnuS0SMA0UbW6vXL8E2sThIHxCiF3EI6laFF2Ck yzIHkTslIuj1OjQJ/3fTQJgmEL8rX1Xii3GJ0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=MQ0z0wmOB/cxze4/LM1mNbTLLMfhcYnA316yUKb9nVQ=; b=lWRmkx2xbWd6dfhqwkL+TULqcy+GGGgtcsJKebtYgDBxsxeo+5KzhbvYlF/xxfKusL Qv+yM05z/vbDTnuypoUrR6+3fIRUbUu4WNmWbf9xBBjOPw6WcyJjysWaksYnqE51FoSW v4HR3RuXyRIrMVCxnFjsvCRzc9nGkX5JeY8/ro6/nr0a439zmH3pLZ4IZBg7vfq1gNl+ nDqsLhyBY5SadDZNiP3M7CDXhP0LEyPJ+9qgUcpK2IE6i9Y4YhHoz+yPAj5p1OtY3het P+rFZ+iAwCLztSW3IJqJRAc3qgnB+LaRafSO/BSadbjdiDMM5qBbXAWild2EiqM6S/Tv VLJQ== X-Gm-Message-State: AE9vXwOCNrKUIRNq2ZoXfgPYdE1CMiDt9EUjvpa7H5/mazKKJfi6sAKrLW8Wf6tLGrjw4P92QSwAKWgMhrOMJAit X-Received: by 10.28.142.83 with SMTP id q80mr2620445wmd.92.1473941789182; Thu, 15 Sep 2016 05:16:29 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.25.68 with HTTP; Thu, 15 Sep 2016 05:16:28 -0700 (PDT) In-Reply-To: References: <6EAAE263-9555-4520-80F6-5A2112A5AFD4@apache.org> From: Matthew Rocklin Date: Thu, 15 Sep 2016 08:16:28 -0400 Message-ID: Subject: Re: libhdfs3 development is still going on outside of ASF To: dev@hawq.incubator.apache.org Content-Type: multipart/alternative; boundary=001a114430fac3aaa0053c8ad0e9 archived-at: Thu, 15 Sep 2016 12:16:44 -0000 --001a114430fac3aaa0053c8ad0e9 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi All, I joined this e-mail list in order to chime in to this discussion. I'm not part of Apache HAWQ but *do* use libhdfs3 and know a number of other people who do as well. I maintain a library for parallel programming Dask , which is commonly used within the PyData software ecosystem. We often interact with data on HDFS and found libhdfs3 to be an excellent solution, particularly because it doesn't require JVM interaction, which is rare among our users. To assist Python users we made the wrapper library hdfs3 , which has gotten some traction both within Dask and outside. We intentionally released and maintain hdfs3 separately from Dask because it's a more general and releasable component. This turns out to have been a good move. There are lots of people who use hdfs3 who have no interest in using Dask at all. They appreciate this separation because they're not forced to grab all of Dask in order to just get the single component they want, hdfs3. These are great users. They come from a wide range of university to small and large businesses. They contribute back to hdfs3 readily and are also, today, trying to contribute back to libhdfs3. By not tying hdfs3 into Dask we increased both community engagement and social impact. So my initial bias is "Please, keep libhdfs3 separate. It will make my life (and the lives of many others) much more convenient." However I also recognize the need for Apache's strict-for-a-reason policies. No matter what you all decide the PyData community will find a way to make things work. I just wanted to make it clear that there are several other stakeholders out there using this library so that this decision wasn't made in a vacuum. Best, -matthew rocklin On Thu, Sep 15, 2016 at 2:38 AM, Zhanwei Wang wrote: > Hi Roman > > I think I have discussed enough about the benefit and drawback of merge > two independent project together. > Let me propose a way to see if it can make both ASF and libhdfs3=E2=80=99= s user > happy. And I need your advise. > > > Is it possibile to have two git repository in ASF for HAWQ incubator > project. If it is possible, I propose to solve the libhdfs3 issue like th= is. > > 1) create a new git repository in ASF and push all libhdfs3=E2=80=99s cod= e and > branch from Github to ASF. > 2) make libhdfs3=E2=80=99s Github repository as read only mirror of ASF > repository. Maybe need to transfer current owner of Github repository fro= m > Pivotal to ASF on Github. > 3) HAWQ keep the stable version code of libhdfs3 or just Git reference. > > > In this way, we keep libhdfs3 independent and keep its all pull request, > wiki, issues and history. And most importantly libhdfs3 can follow ASF > rules and process. People can file pull request on Github and commit to A= SF > repository and eventually mirror to Github. > > > Any comments? > > > Best Regards > > Zhanwei Wang > wangzw@apache.org > > > > > =E5=9C=A8 2016=E5=B9=B49=E6=9C=8815=E6=97=A5=EF=BC=8C=E4=B8=8B=E5=8D=88= 2:19=EF=BC=8CZhanwei Wang =E5=86=99=E9=81=93=EF=BC=9A > > > >> Open source is about community first. > > > > Good point Kyle. I strongly agree with you! > > > > But unfortunately seems no one in this thread care about libhdfs3=E2=80= =99s > community (users) except me. Positively ignore the frustration of libhdfs= 3 > users and about to delete it=E2=80=99s repository. > > > > > > So let=E2=80=99s set the tone of this thread. > > > > If we remove libhdfs3=E2=80=99s repository or make it read only: > > a. What benefit we can get for BOTH HAWQ and libhdfs3=E2=80=99s users? > > b. What drawback for BOTH HAWQ and libhdfs3=E2=80=99s users? > > > > > > > > The following is my answer. > > > > a. Benefit: For HAWQ, seems ASF govern its property with ASF rules. Fo= r > libhdfs3=E2=80=99s users, none. > > > > b. Drawback: For HAWQ, not relevant commits will come into HAWQ=E2=80= =99s commit > log. JIRA and pull request will be fired in HAWQ but not related to HAWQ. > Furthermore commit in libhdfs3 may break HAWQ and it=E2=80=99s hard to de= bug, I > have experienced it enough. It is important to use the stable version of > libhdfs3, HAWQ code should only keep the stable version of libhdfs3. > > > > For libhdfs3=E2=80=99s user, they have to ask question in HAWQ=E2=80= =99s community. > They have to clone entire HAWQ to build libhdfs3 and contribute. > > > > Let=E2=80=99s think about more. How we schedule a release of libhdfs3 w= hen HAWQ > is under developing? Should we branch HAWQ for libhdfs3=E2=80=99s release= ? Should > we merge libhdfs3=E2=80=99s pull request when we are releasing HAWQ? Do w= e have to > sync the release process of HAWQ and libhdfs3 and how? > > > > Maybe we should better involve libhdfs3=E2=80=99s users into this threa= d. But > unfortunately they are not in HAWQ=E2=80=99s mail list. See, this is anot= her big > issue. We discuss dropping libhdfs3=E2=80=99s repository in HAWQ=E2=80=99= s mail list > without libhdfs3=E2=80=99s users involved, seems odd. Image this, one day= the > repository you are working with is gone and you even do not know this > discuss. > > > > If anyone want to discuss if we should dropping libhdfs3=E2=80=99s repo= sitory, > the better place is libhdfs3=E2=80=99s repository. > > > > In general merge two independent project together introduce more troubl= e > than benefit. > > > > To be clear, I=E2=80=99m not against ASF rule. I=E2=80=99m deeply under= stand the > importance of it. Is there any way to make HAWQ and libhdfs3 separated an= d > make both ASF and libhdfs3=E2=80=99s user happy? Just like Kyle said, =E2= =80=9CHOW=E2=80=9D is more > important. > > > > @Roman, your mentoring is important. > > > > > > Any comments? > > > > > > Best Regards > > > > Zhanwei Wang > > wangzw@apache.org > > > > > > > >> =E5=9C=A8 2016=E5=B9=B49=E6=9C=8815=E6=97=A5=EF=BC=8C=E4=B8=8B=E5=8D= =8812:54=EF=BC=8CKyle Dunn =E5=86=99=E9=81=93=EF=BC=9A > >> > >> Chiming in here only as a casual but concerned observer. > >> > >> Open source is about community first. If the logistics around "where" > >> libhdfs3 lives rather than the much more important issue of "how" it > lives > >> are the focus here, I think we've missed the real issue. > >> > >> For what it's worth, I concur with others, let's move it to HAWQ > >> exclusively and move on to addressing the community, starting with the > >> decision being made and how/where future contributions can be made. > >> > >> My brief scan of libhdfs3 shows numerous open pull requests (with > >> apparently useful contributions) and several loose ends "issues". We > need > >> to communicate effectively to these contributors whether those PRs and > >> issues are valuable and relevant. This type of engagement is what OSS > >> projects live and die by. We need to be better, starting with libhdfs3= , > >> into HAWQ, and beyond. > >> > >> "Open source isn't someone else's job" - it's everyone's job. I'm > >> challenging everyone with commit responsibly on repos to value communi= ty > >> input (both code and issues) as highly as your own backlog. Pay it > forward > >> and maybe the community will start shrinking your backlog unexpectedly= . > >> > >> > >> -Kyle > >> > >> On Wed, Sep 14, 2016, 21:33 Lei Chang wrote: > >> > >>> > >>> There was a short discussion before when we moved libhfds3 to HAWQ > repo. > >>> > >>> http://mail-archives.apache.org/mod_mbox/incubator-hawq- > dev/201602.mbox/%3cCAE44UQe1xgcVOC76T_mgVbgGbR=3D > Lx=3DXUBPVw18ZK4iZ3euCH+g@mail.gmail.com%3e > >>> I think it makes sense to keep libhdfs3 only in HAWQ repo to simplify > >>> Apache build and releases in current phase. This is what we have done > in > >>> the past. But looks not everyone is on the same page. > >>> CheersLei > >>> > >>> > >>> > >>> > >>> > >>> > >>> On Thu, Sep 15, 2016 at 11:12 AM +0800, "Greg Chase" < > greg@gregchase.com> > >>> wrote: > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> Its fine if libhdfs3 is a third party license, and is treated that wa= y. > >>> > >>> However, why does Apache HAWQ want to be dependent on some strange 3r= d > >>> party library with no transparency? > >>> > >>> We are having enough difficulties just getting our first release out. > >>> > >>> Is there a compelling reason why we need to keep up with the > independently > >>> developed libhdfs3 project? Are they willing to make necessary > changes so > >>> that they are compatible with ASF's strict-for-a-good-reason policies= ? > >>> > >>> Can we fork hdfs3 for Apache HAWQ's purposes in Apache? > >>> > >>> If any libhdfs3 committers are also part of Apache HAWQ, perhaps you > can > >>> shed some light on the viability of this as an independent project > since I > >>> only see 4 contributors. > >>> > >>> -Greg > >>> > >>> On Wed, Sep 14, 2016 at 7:54 PM, Hong Wu wrote: > >>> > >>>> In my opinion, I think it is reasonable to transfer the third-party > repo > >>> of > >>>> libhdfs3 totally into HAWQ, not only for the convenience of HAWQ > build, > >>> but > >>>> also for the consideration of ASF project. So for HAWQ project, I am > with > >>>> Roman. > >>>> > >>>> But my concern is the current users of libhdfs3 and all the pull > >>> requests, > >>>> wiki docs and issues. Another uncertain aspect from my perspective i= s > >>> that > >>>> although HAWQ could not run without libhdfs3, libhdfs3 could be used > in > >>>> other open source projects, that might be the true meaning of making > >>>> libhdfs3 open source at the beginning. > >>>> > >>>> In summary, if it is really against the spirit of a ASF project for > >>> HAWQ, a > >>>> suggested way might be marking original libhdfs3 repo as a legacy > repo in > >>>> stead of remove it. > >>>> > >>>> Best > >>>> Hong > >>>> > >>>> 2016-09-15 10:04 GMT+08:00 Zhanwei Wang : > >>>> > >>>>> Currently libhdfs3=E2=80=99s official code is not the same as in HA= WQ. Some > new > >>>>> code does not copy into HAWQ. I do not think code change of libhdf= s3 > >>>>> should follow HAWQ=E2=80=99s commit process because many change ar= e not > >>> related > >>>> to > >>>>> HAWQ. > >>>>> > >>>>> From HAWQ side, I suggest to keep the stable version of its > third-party > >>>>> libraries and copy new libhdfs3=E2=80=99s code only when it is nece= ssary. > >>>>> > >>>>> libhdfs3 was open source years before HAWQ incubating with a > separated > >>>>> permission of its authority. So in my opinion it is a third party a= nd > >>> it > >>>>> actually was a third party before HAWQ incubating. And HAWQ is not > the > >>>> only > >>>>> user. > >>>>> > >>>>> > >>>>> > >>>>> Best Regards > >>>>> > >>>>> Zhanwei Wang > >>>>> wangzw@apache.org > >>>>> > >>>>> > >>>>> > >>>>>> =E5=9C=A8 2016=E5=B9=B49=E6=9C=8815=E6=97=A5=EF=BC=8C=E4=B8=8A=E5= =8D=889:35=EF=BC=8CRoman Shaposhnik =E5=86=99=E9=81=93=EF=BC=9A > >>>>>> > >>>>>> On Wed, Sep 14, 2016 at 6:29 PM, Zhanwei Wang > >>>> wrote: > >>>>>>> Hi Roman > >>>>>>> > >>>>>>> libhdfs3 works as third-party library of HAWQ, Just for the > >>>> convenience > >>>>> of HAWQ release > >>>>>>> process we copy its code into HAWQ. The reason is that HAWQ used > to > >>>>> dependent on > >>>>>>> specific version of libhdfs3 and libhdfs3 only distribute as sour= ce > >>>>> code and the build process is complicated. > >>>>>> > >>>>>> I actually don't buy this argument. libhdfs3 is not an optional > >>>>>> dependency for HAWQ > >>>>>> like ORCA is (for example). Without libhdfs3 there's pretty tough = to > >>>>>> imagine HAWQ. > >>>>>> As such the code base needs to be governed as part of the ASF > >>> project, > >>>>>> not a random > >>>>>> GitHub dependency. > >>>>>> > >>>>>> IOW, let me ask you this: were all the changes that went into > >>> libhdfs3 > >>>>>> that is part of > >>>>>> HAWQ discussed and reviewed via the ASF development process or did > >>> you > >>>>> just > >>>>>> import them from time to time as this comment suggests: > >>>>>> https://issues.apache.org/jira/browse/HAWQ-1046? > >>>>> focusedCommentId=3D15489669&page=3Dcom.atlassian.jira. > >>>>> plugin.system.issuetabpanels:comment-tabpanel#comment-15489669 > >>>>>> ? > >>>>>> > >>>>>>> I do not think we have any reason to shutdown a third party=E2=80= =99s > >>> official > >>>>> repository. > >>>>>> > >>>>>> You say 3d party as though its not just you guys maintaining it on > >>> the > >>>>> side. > >>>>>> > >>>>>>> We also copy google test source code into HAWQ, just as what we d= id > >>>> for > >>>>> libhdfs3. > >>>>>> > >>>>>> But this is very different. You don't do any development (certainl= y > >>>>>> you don't do any > >>>>>> non-trivial development) of that code. > >>>>>> > >>>>>>> libhdfs3 open source under Apache license version 2 just the same > as > >>>>> HAWQ. So I believe there is no license issue. > >>>>>> > >>>>>> You're correct. There's no licensing issue but there's a pretty > >>>>> significant > >>>>>> governance issue. > >>>>>> > >>>>>> Thanks, > >>>>>> Roman. > >>>>>> > >>>>> > >>>>> > >>>> > >>> > >>> > >>> > >>> > >>> > >>> -- > >> *Kyle Dunn | Data Engineering | Pivotal* > >> Direct: 303.905.3171 <3039053171> | Email: kdunn@pivotal.io > > > > --001a114430fac3aaa0053c8ad0e9--