hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Diego Ceccarelli <diego.ceccare...@gmail.com>
Subject Re: problem using getLocalCacheArchives in DistributeCache
Date Thu, 19 May 2011 10:50:33 GMT
Dear all,

I finally solved the Distribute Cache issue using  symlink:
Before launching the jobs I put:

//activate symlink
URI archiveUri = new URI(hdfsArchivePath+"#symbolicName");
DistributedCache.addCacheArchive(archiveUri, jobConf);

Then in the jobs I used:

URL resource = jobConf.getResource("#symbolicName");

Now, "resource" contains the path of the directory where the
archive is locally decompressed.
Hope it helps.


On Mon, May 16, 2011 at 11:00 PM, Diego Ceccarelli
<diego.ceccarelli@gmail.com> wrote:
> Hi all,
> I'm trying to distribute locally a MapFile using Hadoop's Distribute Cache.
> As The Definitive Guide suggests, since MapFiles are a collection of files
> with a defined directory structure, I zipped it and I copied in the hdfs:
> bin/hadoop fs -copyFromLocal mapfile.zip /user/myuser/myproject/
> and I tried to use the DistributedCache to send a copy of the mapfile
> to each node (as explained in [1]). So I set
> DistributedCache.addCacheArchive(new
> Path("/user/myuser/myproject/mapfile.zip").toUri(), jobConf);
> and then in the reduce step I put:
> Path[] files = DistributedCache.getLocalCacheArchives(conf);
> this retrieves the path of the zipped file on the local node, while,
> according to [1].
> i expected to find the extracted archive:
> "DistributedCache can be used to distribute simple, read-only
> data/text files and/or more complex types such as archives, jars etc.
> Archives (zip, tar and tgz/tar.gz files) are un-archived at the slave
> nodes."
> I also tried to unzip the file but at the expected path I always do
> not find the files that should be there.
> Does anyone know where I mistake? Could anyone show me a bunch of code
> to locally access file
> within an archive?
> Thanks in advance!
> Diego
> [1] http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/filecache/DistributedCache.html

Computers are useless. They can only give you answers.
(Pablo Picasso)
Diego Ceccarelli
High Performance Computing Laboratory
Information Science and Technologies Institute (ISTI)
Italian National Research Council (CNR)
Via Moruzzi, 1
56124 - Pisa - Italy

Phone: +39 050 315 3055
Fax: +39 050 315 2040

View raw message