hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Siddharth Dawar <siddharthdawa...@gmail.com>
Subject Accessing files in Hadoop 2.7.2 Distributed Cache
Date Tue, 07 Jun 2016 09:05:43 GMT
Hi,

I want to use the distributed cache to allow my mappers to access data in
Hadoop 2.7.2. In main, I'm using the command

String hdfs_path="hdfs://localhost:9000/bloomfilter";InputStream in =
new BufferedInputStream(new
FileInputStream("/home/siddharth/Desktop/data/bloom_filter"));Configuration
conf = new Configuration();fs =
FileSystem.get(java.net.URI.create(hdfs_path), conf);OutputStream out
= fs.create(new Path(hdfs_path));						 //Copy file from local to
HDFSIOUtils.copyBytes(in, out, 4096, true);						
System.out.println(hdfs_path + " copied to
HDFS");DistributedCache.addCacheFile(new Path(hdfs_path).toUri(),
conf2);

DistributedCache.addCacheFile(new Path(hdfs_path).toUri(), conf2);


The above code adds a file present on my local file system to HDFS and
adds it to the distributed cache.


However, in my mapper code, when I try to access the file stored in
distributed cache, the Path[] P variable gets null value. d


public void configure(JobConf conf)			{				this.conf = conf;				try
{					Path [] p=DistributedCache.getLocalCacheFiles(conf);				} catch
(IOException e) {					// TODO Auto-generated catch
block					e.printStackTrace();				}												}

Even when I tried to access distributed cache from the following code

in my mapper, the code returns the error that bloomfilter file doesn't exist

strm = new DataInputStream(new FileInputStream("bloomfilter"));// Read
into our Bloom filter.filter.readFields(strm);strm.close();

However, I read somewhere that if we add a file to distributed cache,
we can access it

directly from its name.

Can you please help me out ?

Mime
View raw message