hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: [DISCUSS] More Shading
Date Thu, 13 Apr 2017 23:36:11 GMT

Nick Dimiduk wrote:
>> >  Well put, Nick.
>> >
>> >  With Sean's point about the Hadoop shaded client, it seems to me that we
>> >  have things which could be pursued in parallel:
>> >
>> >  1) Roadmap to Hadoop3 (and shaded hdfs client).
>> >  2) Identify components which we use from Hadoop, for each component:
>> >      2a) Work with Hadoop to isolate that component from other cruft (best
>> >  example is the Configuration class -- you get something like 8MB of
>> >  "jar" just to parse an xml file).
>> >      2b) Pull the implementation into HBase, removing dependency from
>> >  Hadoop entirely.
>> >
>> >  I think that both of these can/should be done in parallel to the
>> >  isolation of the dependencies which HBase requires (isolating ourselves
>> >  from upstream, and isolating downstream from us).
> Hang on, these are two different concerns.
> Isolating ourselves from Hadoop follows the line of thought around Hadoop's
> shaded client jars. If we must have this for HBase2.0/Hadoop2.8,  we can
> probably backport their efforts as modules in our own build. See my early
> comment about this being error prone for folks who re-package us. Either
> way, it's time we firm up the boundaries between us and Hadoop.
> Isolating our clients from our deps is best served by our shaded modules.
> What do you think about turning things on their head: for 2.0 the
> hbase-client jar is the shaded artifact by default, not the other way
> around? We have cleanup to get our deps out of our public interfaces in
> order to make this work.

+1 Worst case, people bloat their applications with dependencies (worst 
case 2x the size), but it removes the runtime breakages due to multiple 
versions of a class (because of HBase). I'd gladly pay size cost any day.

> This proposal of an external shaded dependencies module sounds like an
> attempt to solve both concerns at once. It would isolate ourselves from
> Hadoop's deps, and it would isolate our clients from our deps. However, it
> doesn't isolate our clients from Hadoop's deps, so our users don't really
> gain anything from it. I also argue that it creates an unreasonable release
> engineering burden on our project. I'm also not clear on the implications
> to downstreamers who extend us with coprocessors.

I thought I had a reason as to why reducing our reliance on Hadoop 
classes was important (even with a HDFS shaded client), but maybe my 
only consideration was long-term cleanliness (avoid the double-packaging 
of the same classes). I can't come up with an example anymore.

Constructing some exemplars for this work would likely be the best kind 
of "acceptance test". Maybe we can pull common use-cases and just create 
sample stub-projects as to how they'd work with whatever we come up 
with. Hopefully, this would help us minimize downstream burden.

re CPs: I have also not considered them..

View raw message