hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Distributed Cache with New API
Date Thu, 15 Apr 2010 20:57:33 GMT
Please take a look at the loop starting at line 158 in TaskRunner.java:
            p[i] = DistributedCache.getLocalCache(files[i], conf,
                                                  new Path(baseDir),
                                                  fileStatus,
                                                  false, Long.parseLong(

fileTimestamps[i]),
                                                  new Path(workDir.
                                                        getAbsolutePath()),
                                                  false);
          }
          DistributedCache.setLocalFiles(conf, stringifyPathArray(p));

I think the confusing part is that DistributedCache.getLocalCacheFiles() is
paired with DistributedCache.setLocalFiles()

Cheers

On Thu, Apr 15, 2010 at 1:16 PM, Larry Compton
<lawrence.compton@gmail.com>wrote:

> Ted,
>
> Thanks. I have looked at that example. The javadocs for DistributedCache
> still refer to deprecated classes, like JobConf. I'm trying to use the
> revised API.
>
> Larry
>
> On Thu, Apr 15, 2010 at 4:07 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > Please see the sample within
> > src\core\org\apache\hadoop\filecache\DistributedCache.java:
> >
> >  *     JobConf job = new JobConf();
> >  *     DistributedCache.addCacheFile(new
> > URI("/myapp/lookup.dat#lookup.dat"),
> >  *                                   job);
> >
> >
> > On Thu, Apr 15, 2010 at 12:56 PM, Larry Compton
> > <lawrence.compton@gmail.com>wrote:
> >
> > > I'm trying to use the distributed cache in a MapReduce job written to
> the
> > > new API (org.apache.hadoop.mapreduce.*). In my "Tool" class, a file
> path
> > is
> > > added to the distributed cache as follows:
> > >
> > >    public int run(String[] args) throws Exception {
> > >        Configuration conf = getConf();
> > >        Job job = new Job(conf, "Job");
> > >        ...
> > >        DistributedCache.addCacheFile(new Path(args[0]).toUri(), conf);
> > >        ...
> > >        return job.waitForCompletion(true) ? 0 : 1;
> > >    }
> > >
> > > The "setup()" method in my mapper tries to read the path as follows:
> > >
> > >    protected void setup(Context context) throws IOException {
> > >        Path[] paths = DistributedCache.getLocalCacheFiles(context
> > >                .getConfiguration());
> > >    }
> > >
> > > But "paths" is null.
> > >
> > > I'm assuming I'm setting up the distributed cache incorrectly. I've
> seen
> > a
> > > few hints in previous mailing list postings that indicate that the
> > > distributed cache is accessed via the Job and JobContext objects in the
> > > revised API, but the javadocs don't seem to support that.
> > >
> > > Thanks.
> > > Larry
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message