crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Miguel Paraz <>
Subject Copying to DistributedCache using -files
Date Thu, 19 Dec 2013 16:42:22 GMT
I'm studying Crunch with code that relies on the DistributedCache to copy
files to the local filesystem. (My code is at

I'm using 0.9.0-mapreduce2 on a 2.2.0 setup (Hortonworks Sandbox 2.0).

I see that Crunch programs use the same pattern as low-level MapReduce,
with and implementing

Unfortunately, the file I specify with the "-files" parameter is not copied.
I logged getConf().get("tmpfiles") and that configuration entry is there.

At which point should the file copied? I looked through the Hadoop source
code and found that tmpfiles is processed in
- copyAndConfigureFiles()

Is this code not invoked when Crunch is used?
This works with the equivalent MapReduce 2.2.0 API code.

Is there are a working example with distributed files that I could try?


View raw message