hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Curtin <curtin.ch...@gmail.com>
Subject Re: Using addCacheArchive
Date Mon, 29 Jun 2009 12:07:08 GMT
To push the file to HDFS (put it in the 'a_hdfsDirectory' directory)

Configuration config = new Configuration();
FileSystem hdfs = FileSystem.get(config);
Path srcPath = new Path(a_directory + "/" + outputName);
Path dstPath = new Path(a_hdfsDirectory + "/" + outputName);
hdfs.copyFromLocalFile(srcPath, dstPath);


to read it from HDFS in your mapper or reducer:

Configuration config = new Configuration();
FileSystem hdfs = FileSystem.get(config);
Path cachePath= new Path(a_hdfsDirectory + "/" + outputName);
BufferedReader wordReader = new BufferedReader(
        new FileReader(cachePath.toString()));



On Fri, Jun 26, 2009 at 8:55 PM, akhil1988 <akhilanger@gmail.com> wrote:

>
> Thanks Chris for your reply!
>
> Well, I could not understand much of what has been discussed on that forum.
> I am unaware of Cascading.
>
> My problem is simple - I want a directory to present in the local working
> directory of tasks so that I can access it from my map task in the
> following
> manner :
>
> FileInputStream fin = new FileInputStream("Config/file1.config");
>
> where,
> Config is a directory which contains many files/directories, one of which
> is
> file1.config
>
> It would be helpful to me if you can tell me what statements to use to
> distribute a directory to the tasktrackers.
> The API doc http://hadoop.apache.org/core/docs/r0.20.0/api/index.html says
> that archives are unzipped on the tasktrackers but I want an example of how
> to use this in case of a dreictory.
>
> Thanks,
> Akhil
>
>
>
> Chris Curtin-2 wrote:
> >
> > Hi,
> >
> > I've found it much easier to write the file to HDFS use the API, then
> pass
> > the 'path' to the file in HDFS as a property. You'll need to remember to
> > clean up the file after you're done with it.
> >
> > Example details are in this thread:
> >
> http://groups.google.com/group/cascading-user/browse_thread/thread/d5c619349562a8d6#
> >
> > Hope this helps,
> >
> > Chris
> >
> > On Thu, Jun 25, 2009 at 4:50 PM, akhil1988 <akhilanger@gmail.com> wrote:
> >
> >>
> >> Please ask any questions if I am not clear above about the problem I am
> >> facing.
> >>
> >> Thanks,
> >> Akhil
> >>
> >> akhil1988 wrote:
> >> >
> >> > Hi All!
> >> >
> >> > I want a directory to be present in the local working directory of the
> >> > task for which I am using the following statements:
> >> >
> >> > DistributedCache.addCacheArchive(new
> URI("/home/akhil1988/Config.zip"),
> >> > conf);
> >> > DistributedCache.createSymlink(conf);
> >> >
> >> >>> Here Config is a directory which I have zipped and put at the given
> >> >>> location in HDFS
> >> >
> >> > I have zipped the directory because the API doc of DistributedCache
> >> > (http://hadoop.apache.org/core/docs/r0.20.0/api/index.html) says that
> >> the
> >> > archive files are unzipped in the local cache directory :
> >> >
> >> > DistributedCache can be used to distribute simple, read-only data/text
> >> > files and/or more complex types such as archives, jars etc. Archives
> >> (zip,
> >> > tar and tgz/tar.gz files) are un-archived at the slave nodes.
> >> >
> >> > So, from my understanding of the API docs I expect that the Config.zip
> >> > file will be unzipped to Config directory and since I have SymLinked
> >> them
> >> > I can access the directory in the following manner from my map
> >> function:
> >> >
> >> > FileInputStream fin = new FileInputStream("Config/file1.config");
> >> >
> >> > But I get the FileNotFoundException on the execution of this
> statement.
> >> > Please let me know where I am going wrong.
> >> >
> >> > Thanks,
> >> > Akhil
> >> >
> >>
> >> --
> >> View this message in context:
> >> http://www.nabble.com/Using-addCacheArchive-tp24207739p24210836.html
> >> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Using-addCacheArchive-tp24207739p24229338.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message