hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "W.P. McNeill" <bill...@gmail.com>
Subject Re: Adding a soft-linked archive file to the distributed cache doesn't work as advertised
Date Mon, 09 Jan 2012 19:59:51 GMT
I added a DistributedCache.createSymlink(configuration) call right after
the addCacheArcihve() call, but see the same error.

On Mon, Jan 9, 2012 at 11:05 AM, Alejandro Abdelnur <tucu@cloudera.com>wrote:

> Bill,
>
> In addition you must call DistributedCached.createSymlink(configuration),
> that should do.
>
> Thxs.
>
> Alejandro
>
> On Mon, Jan 9, 2012 at 10:30 AM, W.P. McNeill <billmcn@gmail.com> wrote:
>
> > I am trying to add a zip file to the distributed cache and have it
> unzipped
> > on the task nodes with a softlink to the unzipped directory placed in the
> > working directory of my mapper process. I think I'm doing everything the
> > way the documentation tells me to, but it's not working.
> >
> > On the client in the run() function while I'm creating the job I first
> > call:
> >
> > fs.copyFromLocalFile("gate-app.zip", "/tmp/gate-app.zip");
> >
> > As expected, this copies the archive file gate-app.zip to the HDFS
> > directory /tmp.
> >
> > Then I call
> >
> > DistributedCache.addCacheArchive("/tmp/gate-app.zip#gate-app",
> > configuration);
> >
> > I expect this to add "/tmp/gate-app.zip" to the distributed cache and
> put a
> > softlink to it called gate-app in the working directory of each task.
> > However, when I call job.waitForCompletion(), I see the following error:
> >
> > Exception in thread "main" java.io.FileNotFoundException: File does not
> > exist: /tmp/gate-app.zip#gate-app.
> >
> > It appears that the distributed cache mechanism is interpreting the
> entire
> > URI as the literal name of the file, instead of treating the fragment as
> > the name of the softlink.
> >
> > As far as I can tell, I'm doing this correctly according to the API
> > documentation:
> >
> >
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html
> > .
> >
> > The full project in which I'm doing this is up on github:
> > https://github.com/wpm/Hadoop-GATE.
> >
> > Can someone tell me what I'm doing wrong?
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message