hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: Moving Files to Distributed Cache in MapReduce
Date Sat, 30 Jul 2011 02:49:27 GMT

Here's the meat of my post earlier...
Sample code on putting a file on the cache:
DistributedCache.addCacheFile(new URI(path+"MyFileName",conf));

Sample code in pulling data off the cache:
       private Path[] localFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration());
        boolean exitProcess = false;
       int i=0;
        while (!exit){ 
            fileName = localFiles[i].getName();
           if (fileName.equalsIgnoreCase("model.txt")){
                 // Build your input file reader on localFiles[i].toString() 
                 exitProcess = true;
           }
            i++;
        } 
 
 
Note that this is SAMPLE code. I didn't trap the exit condition if the file isn't there and
you go beyond the size of the array localFiles[].
Also I set exit to false because its easier to read this as "Do this loop until the condition
exitProcess is true".
 
When you build your file reader you need the full path, not just the file name. The path will
vary when the job runs.
 
HTH
 
-Mike
 

> From: michael_segel@hotmail.com
> To: common-user@hadoop.apache.org
> Subject: RE: Moving Files to Distributed Cache in MapReduce
> Date: Fri, 29 Jul 2011 21:43:37 -0500
> 
> 
> I could have sworn that I gave an example earlier this week on how to push and pull stuff
from distributed cache.
> 
> 
> > Date: Fri, 29 Jul 2011 14:51:26 -0700
> > Subject: Re: Moving Files to Distributed Cache in MapReduce
> > From: rogchen@ucdavis.edu
> > To: common-user@hadoop.apache.org
> > 
> > jobConf is deprecated in 0.20.2 I believe; you're supposed to be using
> > Configuration for that
> > 
> > On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia <mohitanchlia@gmail.com>wrote:
> > 
> > > Is this what you are looking for?
> > >
> > > http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
> > >
> > > search for jobConf
> > >
> > > On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen <rogchen@ucdavis.edu> wrote:
> > > > Thanks for the response! However, I'm having an issue with this line
> > > >
> > > > Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);
> > > >
> > > > because conf has private access in org.apache.hadoop.configured
> > > >
> > > > On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn <mapred.learn@gmail.com
> > > >wrote:
> > > >
> > > >> I hope my previous reply helps...
> > > >>
> > > >> On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen <rogchen@ucdavis.edu>
> > > wrote:
> > > >>
> > > >> > After moving it to the distributed cache, how would I call it
within
> > > my
> > > >> > MapReduce program?
> > > >> >
> > > >> > On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn <
> > > mapred.learn@gmail.com
> > > >> > >wrote:
> > > >> >
> > > >> > > Did you try using -files option in your hadoop jar command
as:
> > > >> > >
> > > >> > > /usr/bin/hadoop jar <jar name> <main class name>
-files  <absolute
> > > path
> > > >> > of
> > > >> > > file to be added to distributed cache> <input dir>
<output dir>
> > > >> > >
> > > >> > >
> > > >> > > On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <rogchen@ucdavis.edu>
> > > >> > wrote:
> > > >> > >
> > > >> > > > Slight modification: I now know how to add files to
the
> > > distributed
> > > >> > file
> > > >> > > > cache, which can be done via this command placed in
the main or
> > > run
> > > >> > > class:
> > > >> > > >
> > > >> > > >        DistributedCache.addCacheFile(new
> > > >> > URI("/user/hadoop/thefile.dat"),
> > > >> > > > conf);
> > > >> > > >
> > > >> > > > However I am still having trouble locating the file
in the
> > > >> distributed
> > > >> > > > cache. *How do I call the file path of thefile.dat
in the
> > > distributed
> > > >> > > cache
> > > >> > > > as a string?* I am using Hadoop 0.20.2
> > > >> > > >
> > > >> > > >
> > > >> > > > On Fri, Jul 29, 2011 at 10:26 AM, Roger Chen <rogchen@ucdavis.edu
> > > >
> > > >> > > wrote:
> > > >> > > >
> > > >> > > > > Hi all,
> > > >> > > > >
> > > >> > > > > Does anybody have examples of how one moves files
from the local
> > > >> > > > > filestructure/HDFS to the distributed cache in
MapReduce? A
> > > Google
> > > >> > > search
> > > >> > > > > turned up examples in Pig but not MR.
> > > >> > > > >
> > > >> > > > > --
> > > >> > > > > Roger Chen
> > > >> > > > > UC Davis Genome Center
> > > >> > > > >
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > --
> > > >> > > > Roger Chen
> > > >> > > > UC Davis Genome Center
> > > >> > > >
> > > >> > >
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Roger Chen
> > > >> > UC Davis Genome Center
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Roger Chen
> > > > UC Davis Genome Center
> > > >
> > >
> > 
> > 
> > 
> > -- 
> > Roger Chen
> > UC Davis Genome Center
>  		 	   		  
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message