accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: MiniCluster and "provided" scope dependencies
Date Tue, 24 Sep 2013 16:48:52 GMT
On Tue, Sep 24, 2013 at 12:31 PM, Keith Turner <> wrote:
> On Tue, Sep 24, 2013 at 11:57 AM, Josh Elser <> wrote:
>> I'm curious to hear what people think on this.
>> I'm a really big fan of spinning up a minicluster instance to do some
>> "more real" testing of software as I write it.
>> With 1.5.0, it's a bit more painful because I have to add a bunch more
>> dependencies to my project (which previously would only have to depend
>> on the accumulo-minicluster artifact). The list includes, but is
>> likely not limited to, commons-io, commons-configuration,
>> hadoop-client, zookeeper, log4j, slf4j-api, slf4j-log4j12.
>> Best as I understand it, the intent of this was that Hadoop will
>> typically provide these artifacts at runtime, and therefore Accumulo
>> doesn't need to re-bundle them itself which I'd agree with (not
>> getting into that whole issue about the Hadoop "ecosystem"). However,
>> I would think that the minicluster should have non-provided scope
>> dependencies declared on these, as there is no Hadoop installation --
> Would this require declaring dependencies on a particular version of hadoop
> in the minicluster pom?  Or could the minicluster pom have profiles for
> different hadoop versions?  I do not know enough about maven to know if you
> can use profiles declared in a dependency (e.g. if a user depends on
> minicluster, can they activate profiles in it?)

The actual dependency in minicluster is against Apache Hadoop but
that's besides the point.

By marking the hadoop-client dependency as provided that means that
Hadoop's dependencies are *not* included at runtime (because hadoop is
provided, and, as such, so are its dependencies). In other words, this
is completely beside the point of what's actually included in a
distribution of Hadoop when you download and install it.

Apache Hadoop has dependencies we need to run minicluster. By marking
the hadoop-client artifact as 'provided', we do not get its
dependencies and the minicluster fails to run. I think this is easy
enough to work around by overriding the dependencies we need to run
the minicluster in the minicluster module (e.g. make the hadoop-client
not 'provided' in the minicluster module). Thus, as we add more things
to the minicluster that require other libraries, we control the
dependency mgmt instead of forcing that onto the user.

>> there's just the minicluster. As such, this would alleviate users from
>> having to dig into our dependency management or trial&error to figure
>> out what "extra" dependencies they have to include in their project to
>> actually make it work
>> Thoughts?
>> - Josh

View raw message