hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Dimiduk <ndimi...@gmail.com>
Subject Re: [DISCUSS] More Shading
Date Wed, 12 Apr 2017 16:01:48 GMT
On Wed, Apr 12, 2017 at 8:28 AM Josh Elser <elserj@apache.org> wrote:

> Sean Busbey wrote:
> > On Tue, Apr 11, 2017 at 11:43 PM Nick Dimiduk<ndimiduk@gmail.com>
> wrote:
> >
> >>> This effort is about our internals. We have a mess of other components
> >> all
> >>> up inside us such as HDFS, etc., each with their own sets of
> dependencies
> >>> many of which we have in common. This project t is about making it so
> we
> >>> can upgrade at a rate independent of when our upstreamers choose to
> >> change.
> >>
> >> Pardon as I try to get a handle on the intention behind this thread.
> >>
> >> If the above quote is true, then I think what we want is a set of shaded
> >> Hadoop client libs that we can depend on so as to not get all the
> >> transitive deps. Hadoop doesn't provide it, but we could do so ourselves
> >> with (yet another) module in our project. Assuming, that is, the
> upstream
> >> client interfaces are well defined and don't leak stuff we care about.
> It
> >> also creates a terrible nightmare for anyone downstream of us who
> >> repackages HBase. The whole thing is extremely error-prone, because
> there's
> >> not very good tooling for this. Realistically, we end up with a
> combination
> >> of the enforcer plugin and maybe our own custom plugin to ensure clean
> >> transitive dependencies...
> >>
> >>
> > Hadoop does provide a shaded client as of the 3.0.0* release line. We
> could
> > push as a community for a version of that for Hadoop's branch-2.
> >
> > Unfortunately, that shaded client won't help where we're reaching into
> the
> > guts of Hadoop (like our reliance on their web stuff).
> Well put, Nick.
> With Sean's point about the Hadoop shaded client, it seems to me that we
> have things which could be pursued in parallel:
> 1) Roadmap to Hadoop3 (and shaded hdfs client).
> 2) Identify components which we use from Hadoop, for each component:
>    2a) Work with Hadoop to isolate that component from other cruft (best
> example is the Configuration class -- you get something like 8MB of
> "jar" just to parse an xml file).
>    2b) Pull the implementation into HBase, removing dependency from
> Hadoop entirely.
> I think that both of these can/should be done in parallel to the
> isolation of the dependencies which HBase requires (isolating ourselves
> from upstream, and isolating downstream from us).

Hang on, these are two different concerns.

Isolating ourselves from Hadoop follows the line of thought around Hadoop's
shaded client jars. If we must have this for HBase 2.0/Hadoop 2.8, we can
probably backport their efforts as modules in our own build. See my early
comment about this being error prone for folks who re-package us. Either
way, it's time we firm up the boundaries between us and Hadoop.

Isolating our clients from our deps is best served by our shaded modules.
What do you think about turning things on their head: for 2.0 the
hbase-client jar is the shaded artifact by default, not the other way
around? We have cleanup to get our deps out of our public interfaces in
order to make this work.

This proposal of an external shaded dependencies module sounds like an
attempt to solve both concerns at once. It would isolate ourselves from
Hadoop's deps, and it would isolate our clients from our deps. However, it
doesn't isolate our clients from Hadoop's deps, so our users don't really
gain anything from it. I also argue that it creates an unreasonable release
engineering burden on our project. I'm also not clear on the implications
to downstreamers who extend us with coprocessors.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message