incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <>
Subject Re: release preparation and Avro versioning
Date Wed, 22 Aug 2012 19:43:04 GMT

I'm fine with defaulting to Avro 1.5.4 -- as I understand it, this doesn't get in the way
of people pulling in a newer version of Avro in their own poms, so I don't see it as a problem.

I personally think that we need to resolve CRUNCH-23 before we do a release -- however, I
think that there are still some compatibility issues with the patch in its current state,
and I can't commit to looking into it much in the coming days. I also don't want to really
delay the release.

What I'd like to propose is that we just set the sort to use a single reducer for now -- this
way the total order sorting will work, but just be less efficient. I think that having a slow
sort gives a better impression than having a broken sort, especially considering that sort
is a base operation that other operations might be built on top of, so if we have a broken
sort it could result in some very hard-to-find issues for users.

Other than that, I'm all for a release, and very excited about reaching that milestone!

Hope everyone is enjoying their vacation (and actually not reading this until they're back
from vacation).

- Gabriel

On Wednesday 22 August 2012 at 08:50, Josh Wills wrote:

> Hey all,
> I just committed CRUNCH-16, which was the last of our open issues that I
> wanted to resolve before our first release. Although I look forward to the
> total ordering sort in CRUNCH-23 and refactoring the planner in CRUNCH-34,
> I feel fine holding off on them until the next release. If any of you feel
> differently or have any other features/bug fixes that you would like to get
> in, now would be a good time to discuss them and give an ETA on their
> arrival.
> Following Matthias' release proposal, we should create a release branch and
> do the final preparations for the release against it. In my mind, that
> consists of removing the SNAPSHOT labels from the POMs and, more
> importantly, deciding on the Avro versions that will be supported in 0.3.0.
> For most of the release, we've been working against 1.6.2 or 1.7.0. But
> Hadoop 2.0.0-alpha runs against Avro 1.5.x, which is not API compatible
> with either, and has certain limitations w/respect to mixing specific and
> reflection-based schemas that can cause problems in certain use cases.
> Fortunately, Gabriel's changes as part of CRUNCH-16 dynamically check the
> functionality that the version of Avro that is being used supports and
> adapts our handling of them accordingly.
> There is no reason that we couldn't run with Avro 1.7.x on Hadoop 1.0.3 and
> run on Avro 1.5.x on Hadoop 2.0.0-alpha, but it feels a little odd to have
> users go backwards in terms of the capabilities of the API when they move
> to a later version of Hadoop. Therefore, I think that the release branch
> for Crunch 0.3.0 should default to using Avro 1.5.4 for both 1.0.3 and
> 2.0.0-alpha, in order to minimize the surprise that a new user would
> encounter in working with the release. We can of course have documentation
> on the Wiki explaining the issue and notifying users of how to upgrade
> their Avro version by changing the pom, as we verified that CRUNCH-16 will
> work on Avro 1.5.x, 1.6.x, and 1.7.x.
> I am on vacation tomorrow through Sunday and will be out of phone/email/IM
> contact for that entire time. (I'm really looking forward to the downtime.)
> It feels good to be close to a release. I'll check back in with the list on
> Sunday evening to see where everyone is at.
> J 

View raw message