crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Miguel Paraz <mpa...@gmail.com>
Subject Copying to DistributedCache using -files
Date Thu, 19 Dec 2013 16:42:22 GMT
Hi,
I'm studying Crunch with code that relies on the DistributedCache to copy
files to the local filesystem. (My code is at
https://bitbucket.org/mparaz/maxmind-crunch)

I'm using 0.9.0-mapreduce2 on a 2.2.0 setup (Hortonworks Sandbox 2.0).

I see that Crunch programs use the same pattern as low-level MapReduce,
with ToolRunner.run() and implementing Tool.run().

Unfortunately, the file I specify with the "-files" parameter is not copied.
I logged getConf().get("tmpfiles") and that configuration entry is there.

At which point should the file copied? I looked through the Hadoop source
code and found that tmpfiles is processed in
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java
- copyAndConfigureFiles()

Is this code not invoked when Crunch is used?
This works with the equivalent MapReduce 2.2.0 API code.

Is there are a working example with distributed files that I could try?

Thanks!
Miguel

Mime
View raw message