crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Copying to DistributedCache using -files
Date Thu, 19 Dec 2013 19:38:16 GMT
Hey Miguel,

You need to call:

ToolRunner.run(new MaxmindCrunchJob(), args, new Configuration());

in main() to pickup the args from the commandline.

J


On Thu, Dec 19, 2013 at 8:42 AM, Miguel Paraz <mparaz@gmail.com> wrote:

> Hi,
> I'm studying Crunch with code that relies on the DistributedCache to copy
> files to the local filesystem. (My code is at
> https://bitbucket.org/mparaz/maxmind-crunch)
>
> I'm using 0.9.0-mapreduce2 on a 2.2.0 setup (Hortonworks Sandbox 2.0).
>
> I see that Crunch programs use the same pattern as low-level MapReduce,
> with ToolRunner.run() and implementing Tool.run().
>
> Unfortunately, the file I specify with the "-files" parameter is not
> copied.
> I logged getConf().get("tmpfiles") and that configuration entry is there.
>
> At which point should the file copied? I looked through the Hadoop source
> code and found that tmpfiles is processed in
> ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java
> - copyAndConfigureFiles()
>
> Is this code not invoked when Crunch is used?
> This works with the equivalent MapReduce 2.2.0 API code.
>
> Is there are a working example with distributed files that I could try?
>
> Thanks!
> Miguel
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message