crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <gabriel.r...@gmail.com>
Subject Re: refactoring crunch-archetype
Date Tue, 12 Mar 2013 18:19:31 GMT
Hi guys,

Speaking of the archetype, I just tried to use it today (actually for the first time) and
it seems that there's an issue with it -- when I tried to run the generated project within
Eclipse, I ran into a class versioning issue. Namely, the version of commons-codec that is
pulled in by commons-httpclient doesn't match up with the version used by Hadoop.

I was going to fix this (by adding an exclusion for commons-codec to the commons-httpclient
dependency). Josh if you're going to do some work on the archetype in the short term I'll
just leave this as it is and it can get tackled as part of the refactoring of the archetype.
Were you planning on doing this refactoring in the pretty short term? If not, I'll fix the
archetype now.

- Gabriel


On 12 Mar 2013, at 09:33, Matthias Friedrich <matt@mafr.de> wrote:

> Hi,
> 
> sure, feel free to take this on. The tricky thing is to make sure that
> the generated project has correct dependencies for both Hadoop 1 and 2.
> 
> Last time I tried this (and failed due to bugs in the archetype plugin),
> I used Velocity templates and introduced a new archetype variable so
> that the user could select if he's creating a Hadoop 1 or 2 project.
> Maybe you get it working, there has since been a new release of the
> archetype plugin.
> 
> Shout if you need any help.
> 
> Regards,
>  Matthias
> 
> On Monday, 2013-03-11, Josh Wills wrote:
>> I cc'd everyone else on here, but since this was your module, I thought it
>> best to solicit your opinion before refactoring it.
>> 
>> We never managed to get crunch-archetypes working w/hadoop 2.x, which is
>> apparently deprecating the lib/* trick for including client dependencies in
>> favor of the -libjars option (see
>> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/and
>> http://architects.dzone.com/articles/using-libjars-option-hadoop )
>> 
>> The way that I have found to do this in Maven is to use the
>> copy-dependencies option of the maven-dependency-plugin and include a shell
>> script in a bin/ directory that knows how to setup the HADOOP_CLASSPATH and
>> libjars arguments for use with hadoop jar. Although this approach is more
>> complex than the lib/* trick, it will be able to support hadoop 1.x as well
>> as hadoop 2.x.
>> 
>> Do you have any objections to me taking this on, and/or any other landmines
>> I should keep an eye out for?
>> 
>> Thanks!
>> Josh
>> 
>> -- 
>> Director of Data Science
>> Cloudera <http://www.cloudera.com>
>> Twitter: @josh_wills <http://twitter.com/josh_wills>


Mime
View raw message