accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benson Margulies <bimargul...@gmail.com>
Subject Re: Hadoop 2 compatibility issues
Date Tue, 14 May 2013 23:41:56 GMT
On Tue, May 14, 2013 at 7:36 PM, Christopher <ctubbsii@apache.org> wrote:
> Benson-
>
> They produce different byte-code. That's why we're even considering
> this. ACCUMULO-1402 is the ticket under which our intent is to add
> classifiers, so that they can be distinguished.

whoops, missed that.

Then how do people succeed in just fixing up their dependencies and using it?

In any case, speaking as a Maven-maven, classifiers are absolutely,
positively, a cure worse than the disease. If you want the details
just ask.

>
> All-
>
> To Keith's point, I think perhaps all this concern is a non-issue...
> because as Keith points out, the dependencies in question are marked
> as "provided", and dependency resolution doesn't occur for provided
> dependencies anyway... so even if we leave off the profiles, we're in
> the same boat. Maybe not the boat we should be in... but certainly not
> a sinking one as I had first imagined. It's as afloat as it was
> before, when they were not in a profile, but still marked as
> "provided".
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Tue, May 14, 2013 at 7:09 PM, Benson Margulies <bimargulies@gmail.com> wrote:
>> I just doesn't make very much sense to me to have two different GAV's
>> for the very same .class files, just to get different dependencies in
>> the poms. However, if someone really wanted that, I'd look to make
>> some scripting that created this downstream from the main build.
>>
>>
>> On Tue, May 14, 2013 at 6:16 PM, John Vines <vines@apache.org> wrote:
>>> They're the same currently. I was requesting separate gavs for hadoop 2.
>>> It's been on the mailing list and jira.
>>>
>>> Sent from my phone, please pardon the typos and brevity.
>>> On May 14, 2013 6:14 PM, "Keith Turner" <keith@deenlo.com> wrote:
>>>
>>>> On Tue, May 14, 2013 at 5:51 PM, Benson Margulies <bimargulies@gmail.com
>>>> >wrote:
>>>>
>>>> > I am a maven developer, and I'm offering this advice based on my
>>>> > understanding of reason why that generic advice is offered.
>>>> >
>>>> > If you have different profiles that _build different results_ but all
>>>> > deliver the same GAV, you have chaos.
>>>> >
>>>>
>>>> What GAV are we currently producing for hadoop 1 and hadoop 2?
>>>>
>>>>
>>>> >
>>>> > If you have different profiles that test against different versions
of
>>>> > dependencies, but all deliver the same byte code at the end of the
>>>> > day, you don't have chaos.
>>>> >
>>>> >
>>>> >
>>>> > On Tue, May 14, 2013 at 5:48 PM, Christopher <ctubbsii@apache.org>
>>>> wrote:
>>>> > > I think it's interesting that Option 4 seems to be most preferred...
>>>> > > because it's the *only* option that is explicitly advised against
by
>>>> > > the Maven developers (from the information I've read). I can see
its
>>>> > > appeal, but I really don't think that we should introduce an explicit
>>>> > > problem for users (that applies to users using even the Hadoop
version
>>>> > > we directly build against... not just those using Hadoop 2... I
don't
>>>> > > know if that point was clear), to only partially support a version
of
>>>> > > Hadoop that is still alpha and has never had a stable release.
>>>> > >
>>>> > > BTW, Option 4 was how I had have achieved a solution for
>>>> > > ACCUMULO-1402, but am reluctant to apply that patch, with this
issue
>>>> > > outstanding, as it may exacerbate the problem.
>>>> > >
>>>> > > Another implication for Option 4 (the current "solution") is for
>>>> > > 1.6.0, with the planned accumulo-maven-plugin... because it means
that
>>>> > > the accumulo-maven-plugin will need to be configured like this:
>>>> > > <plugin>
>>>> > >   <groupId>org.apache.accumulo</groupId>
>>>> > >   <artifactId>accumulo-maven-plugin</artifactId>
>>>> > >   <dependencies>
>>>> > >    ... all the required hadoop 1 dependencies to make the plugin
work,
>>>> > > even though this version only works against hadoop 1 anyway...
>>>> > >   </dependencies>
>>>> > >   ...
>>>> > > </plugin>
>>>> > >
>>>> > > --
>>>> > > Christopher L Tubbs II
>>>> > > http://gravatar.com/ctubbsii
>>>> > >
>>>> > >
>>>> > > On Tue, May 14, 2013 at 5:42 PM, Christopher <ctubbsii@apache.org>
>>>> > wrote:
>>>> > >> I think Option 2 is the best solution for "waiting until we
have the
>>>> > >> time to solve the problem correctly", as it ensures that transitive
>>>> > >> dependencies work for the stable version of Hadoop, and using
Hadoop2
>>>> > >> is a very simple documentation issue for how to apply the patch
and
>>>> > >> rebuild. Option 4 doesn't wait... it explicitly introduces
a problem
>>>> > >> for users.
>>>> > >>
>>>> > >> Option 1 is how I'm tentatively thinking about fixing it properly
in
>>>> > 1.6.0.
>>>> > >>
>>>> > >>
>>>> > >> --
>>>> > >> Christopher L Tubbs II
>>>> > >> http://gravatar.com/ctubbsii
>>>> > >>
>>>> > >>
>>>> > >> On Tue, May 14, 2013 at 4:56 PM, John Vines <vines@apache.org>
wrote:
>>>> > >>> I'm an advocate of option 4. You say that it's ignoring
the problem,
>>>> > >>> whereas I think it's waiting until we have the time to
solve the
>>>> > problem
>>>> > >>> correctly. Your reasoning for this is for standardizing
for maven
>>>> > >>> conventions, but the other options, while more 'correct'
from a maven
>>>> > >>> standpoint or a larger headache for our user base and ourselves.
In
>>>> > either
>>>> > >>> case, we're going to be breaking some sort of convention,
and while
>>>> > it's
>>>> > >>> not good, we should be doing the one that's less bad for
US. The
>>>> > important
>>>> > >>> thing here, now, is that the poms work and we should go
with the
>>>> method
>>>> > >>> that leaves the work minimal for our end users to utilize
them.
>>>> > >>>
>>>> > >>> I do agree that 1. is the correct option in the long run.
More
>>>> > >>> specifically, I think it boils down to having a single
module
>>>> > compatibility
>>>> > >>> layer, which is how hbase deals with this issue. But like
you said,
>>>> we
>>>> > >>> don't have the time to engineer a proper solution. So let
sleeping
>>>> > dogs lie
>>>> > >>> and we can revamp the whole system for 1.5.1 or 1.6.0 when
we have
>>>> the
>>>> > >>> cycles to do it right.
>>>> > >>>
>>>> > >>>
>>>> > >>> On Tue, May 14, 2013 at 4:40 PM, Christopher <ctubbsii@apache.org>
>>>> > wrote:
>>>> > >>>
>>>> > >>>> So, I've run into a problem with ACCUMULO-1402 that
requires a
>>>> larger
>>>> > >>>> discussion about how Accumulo 1.5.0 should support
Hadoop2.
>>>> > >>>>
>>>> > >>>> The problem is basically that profiles should not contain
>>>> > >>>> dependencies, because profiles don't get activated
transitively. A
>>>> > >>>> slide deck by the Maven developers point this out as
a bad
>>>> practice...
>>>> > >>>> yet it's a practice we rely on for our current implementation
of
>>>> > >>>> Hadoop2 support
>>>> > >>>> (
>>>> http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
>>>> > >>>> slide 80).
>>>> > >>>>
>>>> > >>>> What this means is that even if we go through the work
of publishing
>>>> > >>>> binary artifacts compiled against Hadoop2, neither
our Hadoop1
>>>> > >>>> binaries or our Hadoop2 binaries will be able to transitively
>>>> resolve
>>>> > >>>> any dependencies defined in profiles. This has significant
>>>> > >>>> implications to user code that depends on Accumulo
Maven artifacts.
>>>> > >>>> Every user will essentially have to explicitly add
Hadoop
>>>> dependencies
>>>> > >>>> for every Accumulo artifact that has dependencies on
Hadoop, either
>>>> > >>>> because we directly or transitively depend on Hadoop
(they'll have
>>>> to
>>>> > >>>> peek into the profiles in our POMs and copy/paste the
profile into
>>>> > >>>> their project). This becomes more complicated when
we consider how
>>>> > >>>> users will try to use things like Instamo.
>>>> > >>>>
>>>> > >>>> There are workarounds, but none of them are really
pleasant.
>>>> > >>>>
>>>> > >>>> 1. The best way to support both major Hadoop APIs is
to have
>>>> separate
>>>> > >>>> modules with separate dependencies directly in the
POM. This is a
>>>> fair
>>>> > >>>> amount of work, and in my opinion, would be too disruptive
for
>>>> 1.5.0.
>>>> > >>>> This solution also gets us separate binaries for separate
supported
>>>> > >>>> versions, which is useful.
>>>> > >>>>
>>>> > >>>> 2. A second option, and the preferred one I think for
1.5.0, is to
>>>> put
>>>> > >>>> a Hadoop2 patch in the branch's contrib directory
>>>> > >>>> (branches/1.5/contrib) that patches the POM files to
support
>>>> building
>>>> > >>>> against Hadoop2. (Acknowledgement to Keith for suggesting
this
>>>> > >>>> solution.)
>>>> > >>>>
>>>> > >>>> 3. A third option is to fork Accumulo, and maintain
two separate
>>>> > >>>> builds (a more traditional technique). This adds merging
nightmare
>>>> for
>>>> > >>>> features/patches, but gets around some reflection hacks
that we may
>>>> > >>>> have been motivated to do in the past. I'm not a fan
of this option,
>>>> > >>>> particularly because I don't want to replicate the
fork nightmare
>>>> that
>>>> > >>>> has been the history of early Hadoop itself.
>>>> > >>>>
>>>> > >>>> 4. The last option is to do nothing and to continue
to build with
>>>> the
>>>> > >>>> separate profiles as we are, and make users discover
and specify
>>>> > >>>> transitive dependencies entirely on their own. I think
this is the
>>>> > >>>> worst option, as it essentially amounts to "ignore
the problem".
>>>> > >>>>
>>>> > >>>> At the very least, it does not seem reasonable to complete
>>>> > >>>> ACCUMULO-1402 for 1.5.0, given the complexity of this
issue.
>>>> > >>>>
>>>> > >>>> Thoughts? Discussion? Vote on option?
>>>> > >>>>
>>>> > >>>> --
>>>> > >>>> Christopher L Tubbs II
>>>> > >>>> http://gravatar.com/ctubbsii
>>>> > >>>>
>>>> >
>>>>

Mime
View raw message