accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Busbey <bus...@cloudera.com>
Subject Re: [DISCUSS] packaging our dependencies
Date Tue, 13 May 2014 14:55:37 GMT
Does anyone know if we have some build/packaging documentation that lists
where we currently expect to get our various runtime dependencies from? And
when we decide to repackage something that we know is present in the
environment?

I know the assembly file lists which dependencies we package in our binary
assembly and git blame will explain why e.g. commons-math is present. But
it's easy for the git history to get complicated enough for that lookup to
not really work.

I'm looking for the level of which hadoop sub-component, rather than just
"is in the hadoop dist," so we have an easier time seeing what the impact
of this change would be.

Also this would make it easier to see if there are other version mismatches
like ACCUMULO-2791.



On Mon, May 12, 2014 at 7:36 PM, Joey Echeverria <
jecheverria@clouderagovt.com> wrote:

> Packaging other jars that had been made available at runtime by virtue of
> their existence in the Hadoop directories.
>
>
> I'm only talking about dependencies that were/are provided by Hadoop.
>
>
>
>
> But since you brought up ZooKeeper, my understanding is that ZK intends
> for dependent projects to only rely on the ZK jar that is in the top level
> of the tarball. If you need other jars, you should package those yourself.
> WARNING: my info about ZK may be out of date as it's been a long time since
> I spoke to the project about how they intend services that rely on it to be
> consumed.
>
> On Mon, May 12, 2014 at 7:30 PM, Christopher <ctubbsii@apache.org> wrote:
>
> > Does that mean package everything else?
> > What about ZooKeeper?
> > --
> > Christopher L Tubbs II
> > http://gravatar.com/ctubbsii
> > On Mon, May 12, 2014 at 3:38 PM, Joey Echeverria <joey@clouderagovt.com>
> wrote:
> >> +1 to only depending on Hadoop client jars.
> >>
> >>
> >> --
> >> Joey Echeverria
> >> Chief Architect
> >> Cloudera Government Solutions
> >>
> >>
> >> On Sun, May 11, 2014 at 6:07 PM, Christopher <ctubbsii@apache.org>
> wrote:
> >>> In general, I think this is reasonable... especially because Hadoop
> >>> Client stabilizes things a bit. On the other hand, things get really
> >>> complicated with dependencies in the pom (somewhat complicated), and
> >>> packaged dependencies (more complicated), when we're talking about
> >>> supporting both Hadoop 1 and Hadoop 2. I know some of us want to drop
> >>> Hadoop 1 support in 2.0.0, and I think this is one more good reason to
> >>> do that.
> >>>
> >>> Another data point that I think is going to complicate things a (very)
> >>> tiny bit: the work on ACCUMULO-2589 includes things like: drop the
> >>> dependencies on Hadoop from the API. But, we're likely to still have a
> >>> dependency on guava (there was a suggestion to use guava's @Beta
> >>> annotations in the API). Maybe this is fine.... because the packaging
> >>> considerations for the binary tarball are not the same as the API
> >>> module dependencies (though they'll have to be compatible), but it's
> >>> something to consider.
> >>>
> >>> --
> >>> Christopher L Tubbs II
> >>> http://gravatar.com/ctubbsii
> >>>
> >>>
> >>> On Sun, May 11, 2014 at 4:45 PM, Sean Busbey <busbey@cloudera.com>
> wrote:
> >>>> ACCUMULO-2786 has brought up the issue of what dependencies we bring
> with
> >>>> Accumulo rather than depend on the environment providing[1].
> >>>>
> >>>> Christopher explains our extant reasoning thus
> >>>>
> >>>>> The precedent has been: if vanilla Apache Hadoop provides it in
its
> bin
> >>>> tarball, we don't need to.
> >>>>
> >>>> I'd like us to move to packaging any dependencies that aren't brought
> in by
> >>>> Hadoop Client.
> >>>>
> >>>> 1) Our existing practice developed before Hadoop Client existed, so
we
> >>>> essentially *had* to have all of the Hadoop related deps on our
> classpath.
> >>>> For versions where we default to Hadoop 2, we can improve things.
> >>>>
> >>>> 2) We should encourage users to follow good practice by minimizing the
> >>>> number of jars added to the classpath.
> >>>>
> >>>> 3) We have to still include the jars found in Hadoop Client because
> we use
> >>>> hadoop.
> >>>>
> >>>> 4) Limiting the dependencies we rely on external sources to provide
> allows
> >>>> us to update more of our dependencies to current versions.
> >>>>
> >>>> 5) Minimizing the number of jars we rely on from external sources
> reduces
> >>>> the chances that they change out from under us (and thus reduces the
> number
> >>>> of external factors we have to remain cognizant of)
> >>>>
> >>>> 6) Minimizing the classpath reduces the chances of having multiple
> >>>> different versions of the same library present.
> >>>>
> >>>> I'd also like for us to *not* package any of the jars brought in by
> Hadoop
> >>>> Client. Due to the additional work it would take to downgrade our
> version
> >>>> of guava, I'd like to wait to do that.
> >>>>
> >>>> [1]: https://issues.apache.org/jira/browse/ACCUMULO-2786
> >>>>
> >>>> --
> >>>> Sean
>



-- 
Sean

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message