hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <sc...@richrelevance.com>
Subject Re: [VOTE] Direction for Hadoop development
Date Wed, 08 Dec 2010 03:33:35 GMT

On Nov 29, 2010, at 4:22 PM, Owen O'Malley wrote:

> I do not support adding new dependencies to the classpath of MapReduce user
>> tasks.
> That isn't reasonable. As Hadoop evolves, we have and will continue to add
> dependences. For example, in your last MapReduce (MAPREDUCE-980) patch you
> added avro and paranamer as dependences.

As a non PMC member:

Hadoop has already put enough stuff on the classpath to force me to make a custom build to
use (in 0.19 was the start, and now no distribution can work without modification).  This
is because of it stuffing more and more things on the classpath.

It is completely reaonable to ask that the environment that user code runs in not be polluted
with libraries that are not exposed in the Hadoop API, and debate the merits of a patch based
on the inclusion of an additional jar on that classpath.

Webapp containers, OSGi, other classloader systems, or dependency rebasing (jarjar links,
maven shade, etc) help solve this sort of mess.Even more crudely, the user's lib directory
doesn't have to be Hadoop's full lib directory, and the order of inclusion of jars can help.

Either way, if Hadoop wants to be an application execution framework, it can't just throw
whatever it wants on the classpath forever.
If one wants to provide lots of tools as part of a rich environment for users, the user has
to either be able to easily _opt in_ to having those tools available on their class path or
_opt out_ of having them there.

Now, this is really a tangent to other issues at hand. 
I'd like to suggest that rather than point fingers at who added what to what classpath and
when, it is just noted that classpath management is a problem that Hadoop needs to solve and
not ignore.  I'm pretty sure there's a JIRA on it somewhere already.
Until it is solved to some degree (since on a scale of 1 to 10 dealing with classpath collisions,
Hadoop is currently somewhere between 0 and 1), its going to limit what can be built without
causing user applications to break on an upgrade.   Whether those new features are good or
bad on its own merits is being conflated with classpath problems that it introduces for users.

> -- Owen

View raw message