accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <ctubb...@apache.org>
Subject Re: "Provided" dependencies
Date Wed, 06 Nov 2013 23:45:23 GMT
On Wed, Nov 6, 2013 at 5:43 PM, Michael Berman <mberman@sqrrl.com> wrote:
> I think it would be nice to separate what client API users need from the
> the provided dependencies issue.  It seems like whatever module client
> projects depend on should itself only have dependencies on things that it
> actually needs.  If it doesn't need hadoop, then it shouldn't declare it as
> a dependency at all.  The hadoop-dependent server and the
> hadoop-independent client interface both need to share intermediate
> objects, but it seems like those could be defined in another, common
> hadoop-independent module.

What you're talking about is ACCUMULO-1483, which is a separate, but
related, issue (https://issues.apache.org/jira/browse/ACCUMULO-1483)
for creating a minimal API jar (accumulo-client-api), to include at
compile time, so users don't need to include the full dependency tree
to when only writing client code. The client code does need Hadoop
right now (some of our client code accepts or returns hadoop Text
objects). I would hope that the implementation of 1483 would eliminate
those cases.

> In/Outputformats are an exception, but I agree they would be best separated
> into their own hadoop-dependent module (which might itself depend on the
> client module).

Also related to ACCUMULO-1483... perhaps as a subtask to create an
accumulo-client-mapreduce module (or leave these in the accumulo-core
module).

> As far as the provided question goes, it seems to me that the only reason
> to mark a dep provided is if we think developers will *usually* want to
> compile against different versions.  Initially I thought it would make
> sense if we thought the runtime versions would vary, but Chris makes a good
> point that the deps we include in the distributed package can be selected
> independently of the maven dep scope.  Since you can build accumulo against
> any version of hadoop and it will still run against any other version of
> hadoop, I think it's better to make things easier on us by having it
> compile scoped.

I think you're right that the real question is whether users *usually*
need to specify a different version. However, even if they do need to
specify a different version, I think it makes more sense for them to
rely on their dependencyManagement section to select a specific
version, or to use excludes and declare the dependency which provides
the required classes, explicitly.

> If someone depends on the accumulo server, then they may have to exclude
> the transitive dependency if our hadoop is polluting theirs, but I think
> that issue can be mitigated by not requiring client apps to depend on the
> entire server.

Right. That will be solved with ACCUMULO-1483, which I'm going to
tackle in the next dev cycle.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


> On Wed, Nov 6, 2013 at 5:17 PM, Joey Echeverria <joey+ml@clouderagovt.com>wrote:
>
>> Do Accumulo users need Hadoop or it's dependencies in order to use the
>> client APIs?
>>
>> The only client API that I could see needing it would be the
>> [In|Out]putFormats, but it'd be cool if that was a separate module and
>> that module had the appropriate Hadoop dependencies with the compile
>> scope.
>>
>> -Joey
>>
>> On Wed, Nov 6, 2013 at 5:05 PM, Christopher <ctubbsii@apache.org> wrote:
>> > What's the latest opinion whether things should be marked "provided" in
>> the pom?
>> > I've changed my mind on this a few times, myself, so I'm curious what
>> > others think.
>> >
>> > The provided scope means that it will not propagate as a transitive
>> > dependency. Other than that, it doesn't do much... though we can
>> > control packaging based on provided or not.
>> >
>> > I'm not sure this gets us much, and it's inconvenient for users. We
>> > can control packaging in other ways (like being more explicit and
>> > carefully considering which dependencies we include in an RPM or
>> > tarball, for instance).
>> >
>> > If we drop its declaration, what this means, is that if users want to
>> > build with Accumulo as a dependency, but against a different version
>> > of Hadoop than what we declare in our POM, they'll have to explicitly
>> > <exclude> the hadoop dependencies, and redeclare them, or they will
>> > have to use their <dependencyManagement> section to force a particular
>> > dependency of hadoop.
>> >
>> > The advantage to users, though, if we drop this, is that they won't
>> > have to constantly re-declare transitive dependencies to get their
>> > projects to build/test/run.
>> >
>> > See http://s.apache.org/maven-dependency-scopes
>> >
>> > Thoughts?
>> >
>> > --
>> > Christopher L Tubbs II
>> > http://gravatar.com/ctubbsii
>>

Mime
View raw message