hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Botelho, Andrew" <Andrew.Bote...@emc.com>
Subject RE: Distributed Cache
Date Wed, 10 Jul 2013 21:43:35 GMT
Ok so JobContext.getCacheFiles() retures URI[].
Let's say I only stored one folder in the cache that has several .txt files within it.  How
do I use that returned URI to read each line of those .txt files?

Basically, how do I read my cached file(s) after I call JobContext.getCacheFiles()?



From: Omkar Joshi [mailto:ojoshi@hortonworks.com]
Sent: Wednesday, July 10, 2013 5:15 PM
To: user@hadoop.apache.org
Subject: Re: Distributed Cache

try JobContext.getCacheFiles()

Omkar Joshi
Hortonworks Inc.<http://www.hortonworks.com>

On Wed, Jul 10, 2013 at 6:31 AM, Botelho, Andrew <Andrew.Botelho@emc.com<mailto:Andrew.Botelho@emc.com>>
Ok using job.addCacheFile() seems to compile correctly.
However, how do I then access the cached file in my Mapper code?  Is there a method that will
look for any files in the cache?



From: Ted Yu [mailto:yuzhihong@gmail.com<mailto:yuzhihong@gmail.com>]
Sent: Tuesday, July 09, 2013 6:08 PM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: Distributed Cache

You should use Job#addCacheFile()

On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <Andrew.Botelho@emc.com<mailto:Andrew.Botelho@emc.com>>

I was wondering if I can still use the DistributedCache class in the latest release of Hadoop
(Version 2.0.5).
In my driver class, I use this code to try and add a file to the distributed cache:

import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

Configuration conf = new Configuration();
DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);
Job job = Job.getInstance();

However, I keep getting warnings that the method addCacheFile() is deprecated.
Is there a more current way to add files to the distributed cache?

Thanks in advance,


View raw message