hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "W.P. McNeill" <bill...@gmail.com>
Subject Adding a soft-linked archive file to the distributed cache doesn't work as advertised
Date Mon, 09 Jan 2012 18:30:57 GMT
I am trying to add a zip file to the distributed cache and have it unzipped
on the task nodes with a softlink to the unzipped directory placed in the
working directory of my mapper process. I think I'm doing everything the
way the documentation tells me to, but it's not working.

On the client in the run() function while I'm creating the job I first call:

fs.copyFromLocalFile("gate-app.zip", "/tmp/gate-app.zip");

As expected, this copies the archive file gate-app.zip to the HDFS
directory /tmp.

Then I call


I expect this to add "/tmp/gate-app.zip" to the distributed cache and put a
softlink to it called gate-app in the working directory of each task.
However, when I call job.waitForCompletion(), I see the following error:

Exception in thread "main" java.io.FileNotFoundException: File does not
exist: /tmp/gate-app.zip#gate-app.

It appears that the distributed cache mechanism is interpreting the entire
URI as the literal name of the file, instead of treating the fragment as
the name of the softlink.

As far as I can tell, I'm doing this correctly according to the API

The full project in which I'm doing this is up on github:

Can someone tell me what I'm doing wrong?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message