crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <gabriel.r...@gmail.com>
Subject Re: Dependency conflicts
Date Tue, 07 Aug 2012 18:58:17 GMT
+1, split it out. I totally agree with the points made by Matthias. 


On Tuesday 7 August 2012 at 00:01, Josh Wills wrote:

> I'm +1 to split the hbase code out from core into crunch-hbase. Any
> objections?
> 
> On Mon, Aug 6, 2012 at 10:58 AM, Matthias Friedrich <matt@mafr.de (mailto:matt@mafr.de)>
wrote:
> 
> > Hi Josh,
> > 
> > hadoop-1.0.3 + hbase-0.94.0 + crunch didn't work for me. It requires
> > avro-1.5.3 and doesn't compile with avro-1.7.0; but I think the real
> > problem is their use of MethodUtils from commons-lang-2.5 which isn't
> > in Hadoop's commons-lang-2.4.
> > 
> > Of course, we can use hbase-0.90.5, downgrade crunch to avro-1.3.3 and
> > thrift-0.2.0, pray that jersey is irrelevant, tie all other HBase
> > dependencies to the versions Hadoop uses and hope that it works. It
> > may work, at the price of forcing some old versions on our users. But
> > actually, if it works or not isn't the main point, let's have a look
> > at the user perspective.
> > 
> > When you use a thirdparty framework like Hadoop, your application
> > inherits the framework's classpath (*). This means, any other dependency
> > your application has (including transitive dependencies) has to be
> > compatible with the framework's dependencies. The more complex your
> > application is, the more this hurts you. You can't update your
> > dependencies because the framework locks you in. Porting existing,
> > complex applications to the framework is nearly impossible.
> > 
> > I've seen this many times, that's why I evaluate my dependencies
> > carefully. Crunch itself is pretty minimal when it comes to
> > its direct dependencies (we could be even more minimal with little
> > effort). With HBase, however, things look a lot more difficult
> > and that's going to scare users away.
> > 
> > I think if we have the chance to make HBase support an optional
> > feature, much like MapReduce support is optional in Avro, then we
> > should take it.
> > 
> > Users are very thankful when you leave them a choice. I'm a user, I
> > know. I've evaluated dozens of libraries and frameworks and dismissed
> > quite a few because of dependency conflicts. If you're organized well
> > enough to have an evaluation checklist, then this will be on it. I'd
> > like to use Crunch in production one day without bending the rules, so
> > let's lower the barrier to adoption.
> > 
> > Regards,
> > Matthias, stepping off the soap box
> > 
> > (*) Yes, I know about classloader isolation in Java EE and
> > HADOOP_USER_CLASSPATH_FIRST.
> > 
> > On Sunday, 2012-08-05, Josh Wills wrote:
> > > Hey Matthias,
> > > 
> > > I'm not quite willing to give up on hbase just yet-- how does 1.0.3
> > > +Crunch look against hbase 0.94? Is the primary issue the Avro 1.7.0
> > > conflicts?
> > > 
> > > J
> > > 
> > > On Sun, Aug 5, 2012 at 2:10 AM, Matthias Friedrich <matt@mafr.de (mailto:matt@mafr.de)>
wrote:
> > > > Hi,
> > > > 
> > > > I spent most of Saturday resolving dependency conflicts for CRUNCH-16.
> > > > Since nobody's going to read a long mail, here are the cliff notes:
> > > > 
> > > > hadoop-core-1.0.3, hbase-0.90.5, and avro-1.7.0 are incompatible and
> > > > I found no safe solution to fix it. Moving HBase support to a separate
> > > > Maven module may be the best solution because it reduces risk for
> > > > users who don't need HBase.
> > > > 
> > > > 
> > > > The longer version:
> > > > 
> > > > The POM of hadoop-core-1.0.3 is in a sorry state. It doesn't list all
> > > > libraries that are on the runtime classpath, and of these, some are
> > > > wrong. For example, integration tests using LocalJobRunner don't work
> > > > unless you add more dependencies yourself (ie. commons-io). Also,
> > > 
> > 
> > 
> > roughly
> > > > a dozen of hbase-0.90.5's 40 dependencies are in conflict with
> > > > hadoop-core-1.0.3. This means we have to add quite a few "provided"
> > > > dependencies with the correct versions ourselves, but these aren't
> > > > propagated to our users so they have to do the same or risk conflicts
> > > > at runtime.
> > > > 
> > > > I resolved the conflicts to a point where our integration tests work
> > > > which is unfortunately no guarantee that things will work for our
> > > 
> > 
> > 
> > users.
> > > > Using the dependencies of hadoop-core-1.0.3 + Crunch's, the source
> > > > distribution of hbase-0.90.5 doesn't even compile. At an interface
> > > > level, it is incompatible with protobuf-java-2.4.1 (easy enough to fix)
> > > > and avro-1.7.0 (not so easy to fix). Changing only those dependencies
> > > > that are interface compatible (about a dozen) unsurprisingly leads to
> > > > HBase test case failures. This may not affect HBase clients, but you
> > > > never know. There is no hbase-client library so you always get
> > > > everything unless you know HBase well enough to get your exclusions
> > > > right.
> > > > 
> > > > 
> > > > So, where do we go from here? I can get a patch ready that paints
> > > > over some of these problems and makes sure that the dependencies we
> > > > use in our test cases are the same as during runtime. But I really
> > > > need careful review for this.
> > > > 
> > > > To be honest, this situation leaves me a bit uneasy. Maybe the best
> > > > long term solution would be to move HBase support to a separate Maven
> > > > module that depends on crunch core and not force it on everyone. This
> > > > will reduce risk greatly for those who don't need HBase. I think it's
> > > > definitely worth giving it a shot.
> > > > 
> > > > What do you think, guys?
> > > > 
> > > > Regards,
> > > > Matthias
> > > 
> > 
> 
> 
> 
> 
> 
> -- 
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>




Mime
View raw message