hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amareshwari Sri Ramadasu <amar...@yahoo-inc.com>
Subject Re: distributed cache
Date Tue, 29 Jan 2008 04:43:00 GMT
jerrro wrote:
> Hello,
>
> Is there a way to use Distributed Cache with a pipes (C++ code) job? I want
> to be able to access a file on the local disk all over the data nodes, so
> hadoop would copy it to all data nodes before a map reduce job.
>
> Thanks.
>   

Hi,

First of all you need to copy the files to the dfs. And then add the 
file to the distributed cache.
You can give comma sepearted values of the files or archives to be added 
to distributed cache,
for "mapred.cache.files" and "mapred.cache.archives" in the conf file.

Ex:

<property>
  <name>mapred.cache.files</name>
  <value>/files/file1,/files/file2.txt</value>
  <description> The files in distributed cache</description>
</property>

<property>
  <name>mapred.cache.archives</name>
  <value>/archives/arc1.zip,/archives/arc2.jar</value>
  <description>The archives in distributed cache</description>
</property>

You can also give URIs of the file names.
You can give the URI as hdfs://<path>#<link>, here mapred will create a 
symlink with name "link" in the  working directory.

Hope this clarifies

Thanks
Amareshwari


Mime
View raw message