hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen TAK-LON WU <tak...@indiana.edu>
Subject Distributed Cache download the second added archives from HDFS more than once?? mailed-by indiana.edu
Date Wed, 23 Jun 2010 12:46:39 GMT
Dear all,

I am using Hadoop 0.20.2 with the DistributedCache API.

Currently, I figured out that either I use the following ways to add a
cached archive from the HDFS to local slaves, the second add-in archive will
copy to the local disk every single time when I run the same job:

1. using setCacheArchives
                URI[] cache = {new URI(database), new URI(program)};
DistributedCache.setCacheArchives(cache, jc);

2. using addCacheArchive
DistributedCache.addCacheArchive(new URI(database), jc);
DistributedCache.addCacheArchive(new URI(program), jc);

I did track from the local slaves. The "program" archive, which is a
*.tar.gz file,  will download from the HDFS and unzip every time I submit a
job.

Do you know why I got this issue?? any solution for this problem??

Thank you so much.

Sincerely,
Stephen

Mime
View raw message