crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Miguel Paraz <mpa...@gmail.com>
Subject Re: Copying to DistributedCache using -files
Date Fri, 20 Dec 2013 04:26:44 GMT
Hi Josh,

It's working now. Thanks for helping with my newbie question, and looking
at the code.

Confusing that omitting the new Configuration() works with the plain
MapReduce API.

Cheers,
Miguel


On Fri, Dec 20, 2013 at 3:38 AM, Josh Wills <jwills@cloudera.com> wrote:

> Hey Miguel,
>
> You need to call:
>
> ToolRunner.run(new MaxmindCrunchJob(), args, new Configuration());
>
> in main() to pickup the args from the commandline.
>
> J
>
>
> On Thu, Dec 19, 2013 at 8:42 AM, Miguel Paraz <mparaz@gmail.com> wrote:
>
>> Hi,
>> I'm studying Crunch with code that relies on the DistributedCache to copy
>> files to the local filesystem. (My code is at
>> https://bitbucket.org/mparaz/maxmind-crunch)
>>
>> I'm using 0.9.0-mapreduce2 on a 2.2.0 setup (Hortonworks Sandbox 2.0).
>>
>> I see that Crunch programs use the same pattern as low-level MapReduce,
>> with ToolRunner.run() and implementing Tool.run().
>>
>> Unfortunately, the file I specify with the "-files" parameter is not
>> copied.
>> I logged getConf().get("tmpfiles") and that configuration entry is there.
>>
>> At which point should the file copied? I looked through the Hadoop source
>> code and found that tmpfiles is processed in
>> ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java
>> - copyAndConfigureFiles()
>>
>> Is this code not invoked when Crunch is used?
>> This works with the equivalent MapReduce 2.2.0 API code.
>>
>> Is there are a working example with distributed files that I could try?
>>
>> Thanks!
>> Miguel
>>
>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>

Mime
View raw message