Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id BE13D200C54 for ; Wed, 12 Apr 2017 17:28:47 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id BCB04160B95; Wed, 12 Apr 2017 15:28:47 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 10770160B8A for ; Wed, 12 Apr 2017 17:28:46 +0200 (CEST) Received: (qmail 24014 invoked by uid 500); 12 Apr 2017 15:28:46 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 24002 invoked by uid 99); 12 Apr 2017 15:28:46 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Apr 2017 15:28:46 +0000 Received: from hw10447.local (unknown [167.102.188.146]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 9E67A1A036B for ; Wed, 12 Apr 2017 15:28:45 +0000 (UTC) Message-ID: <58EE47A9.2030108@apache.org> Date: Wed, 12 Apr 2017 11:28:41 -0400 From: Josh Elser User-Agent: Postbox 3.0.11 (Macintosh/20140602) MIME-Version: 1.0 To: dev@hbase.apache.org Subject: Re: [DISCUSS] More Shading References: <35A15B2D-07EF-445A-9CA8-11BF47C72871@gmail.com> <7AE04EB4-7599-4181-B71D-B261773BF636@amazon.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit archived-at: Wed, 12 Apr 2017 15:28:47 -0000 Sean Busbey wrote: > On Tue, Apr 11, 2017 at 11:43 PM Nick Dimiduk wrote: > >>> This effort is about our internals. We have a mess of other components >> all >>> up inside us such as HDFS, etc., each with their own sets of dependencies >>> many of which we have in common. This project t is about making it so we >>> can upgrade at a rate independent of when our upstreamers choose to >> change. >> >> Pardon as I try to get a handle on the intention behind this thread. >> >> If the above quote is true, then I think what we want is a set of shaded >> Hadoop client libs that we can depend on so as to not get all the >> transitive deps. Hadoop doesn't provide it, but we could do so ourselves >> with (yet another) module in our project. Assuming, that is, the upstream >> client interfaces are well defined and don't leak stuff we care about. It >> also creates a terrible nightmare for anyone downstream of us who >> repackages HBase. The whole thing is extremely error-prone, because there's >> not very good tooling for this. Realistically, we end up with a combination >> of the enforcer plugin and maybe our own custom plugin to ensure clean >> transitive dependencies... >> >> > Hadoop does provide a shaded client as of the 3.0.0* release line. We could > push as a community for a version of that for Hadoop's branch-2. > > Unfortunately, that shaded client won't help where we're reaching into the > guts of Hadoop (like our reliance on their web stuff). Well put, Nick. With Sean's point about the Hadoop shaded client, it seems to me that we have things which could be pursued in parallel: 1) Roadmap to Hadoop3 (and shaded hdfs client). 2) Identify components which we use from Hadoop, for each component: 2a) Work with Hadoop to isolate that component from other cruft (best example is the Configuration class -- you get something like 8MB of "jar" just to parse an xml file). 2b) Pull the implementation into HBase, removing dependency from Hadoop entirely. I think that both of these can/should be done in parallel to the isolation of the dependencies which HBase requires (isolating ourselves from upstream, and isolating downstream from us).