crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Copying to DistributedCache using -files
Date Fri, 20 Dec 2013 04:29:21 GMT
No problem. I've never understood it either, just one of those things I
noticed a long time ago. :)


On Thu, Dec 19, 2013 at 8:26 PM, Miguel Paraz <mparaz@gmail.com> wrote:

> Hi Josh,
>
> It's working now. Thanks for helping with my newbie question, and looking
> at the code.
>
> Confusing that omitting the new Configuration() works with the plain
> MapReduce API.
>
> Cheers,
> Miguel
>
>
> On Fri, Dec 20, 2013 at 3:38 AM, Josh Wills <jwills@cloudera.com> wrote:
>
>> Hey Miguel,
>>
>> You need to call:
>>
>> ToolRunner.run(new MaxmindCrunchJob(), args, new Configuration());
>>
>> in main() to pickup the args from the commandline.
>>
>> J
>>
>>
>> On Thu, Dec 19, 2013 at 8:42 AM, Miguel Paraz <mparaz@gmail.com> wrote:
>>
>>> Hi,
>>> I'm studying Crunch with code that relies on the DistributedCache to
>>> copy files to the local filesystem. (My code is at
>>> https://bitbucket.org/mparaz/maxmind-crunch)
>>>
>>> I'm using 0.9.0-mapreduce2 on a 2.2.0 setup (Hortonworks Sandbox 2.0).
>>>
>>> I see that Crunch programs use the same pattern as low-level MapReduce,
>>> with ToolRunner.run() and implementing Tool.run().
>>>
>>> Unfortunately, the file I specify with the "-files" parameter is not
>>> copied.
>>> I logged getConf().get("tmpfiles") and that configuration entry is there.
>>>
>>> At which point should the file copied? I looked through the Hadoop
>>> source code and found that tmpfiles is processed in
>>> ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java
>>> - copyAndConfigureFiles()
>>>
>>> Is this code not invoked when Crunch is used?
>>> This works with the equivalent MapReduce 2.2.0 API code.
>>>
>>> Is there are a working example with distributed files that I could try?
>>>
>>> Thanks!
>>> Miguel
>>>
>>
>>
>>
>> --
>> Director of Data Science
>> Cloudera <http://www.cloudera.com>
>> Twitter: @josh_wills <http://twitter.com/josh_wills>
>>
>
>


-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message