hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Omkar Joshi <ojo...@hortonworks.com>
Subject Re: Distributed Cache
Date Wed, 10 Jul 2013 22:47:17 GMT
      Path[] cachedFilePaths =

          DistributedCache.getLocalCacheFiles(context.getConfiguration());

      for (Path cachedFilePath : cachedFilePaths) {

        File cachedFile = new File(cachedFilePath.toUri().getRawPath());

        System.out.println("cached fie path >> "

            + cachedFile.getAbsolutePath());

      }

I hope this helps for the time being.. JobContext was suppose to replace
DistributedCache api (it will be deprecated) however there is some problem
with that or I am missing something... Will reply if I find the solution to
it.

getCacheFiles will give you the uri used for localizing files... (original
uri used for adding it to cache).

getLocalCacheFiles .. will give you the actual file path on node manager.

Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>


On Wed, Jul 10, 2013 at 2:43 PM, Botelho, Andrew <Andrew.Botelho@emc.com>wrote:

> Ok so JobContext.getCacheFiles() retures URI[].****
>
> Let’s say I only stored one folder in the cache that has several .txt
> files within it.  How do I use that returned URI to read each line of those
> .txt files?****
>
> ** **
>
> Basically, how do I read my cached file(s) after I call
> JobContext.getCacheFiles()?****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Andrew****
>
> ** **
>
> *From:* Omkar Joshi [mailto:ojoshi@hortonworks.com]
> *Sent:* Wednesday, July 10, 2013 5:15 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributed Cache****
>
> ** **
>
> try JobContext.getCacheFiles()****
>
>
> ****
>
> Thanks,****
>
> Omkar Joshi****
>
> *Hortonworks Inc.* <http://www.hortonworks.com>****
>
> ** **
>
> On Wed, Jul 10, 2013 at 6:31 AM, Botelho, Andrew <Andrew.Botelho@emc.com>
> wrote:****
>
> Ok using job.addCacheFile() seems to compile correctly.****
>
> However, how do I then access the cached file in my Mapper code?  Is there
> a method that will look for any files in the cache?****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Andrew****
>
>  ****
>
> *From:* Ted Yu [mailto:yuzhihong@gmail.com]
> *Sent:* Tuesday, July 09, 2013 6:08 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Distributed Cache****
>
>  ****
>
> You should use Job#addCacheFile()****
>
>
> Cheers****
>
> On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <Andrew.Botelho@emc.com>
> wrote:****
>
> Hi,****
>
>  ****
>
> I was wondering if I can still use the DistributedCache class in the
> latest release of Hadoop (Version 2.0.5).****
>
> In my driver class, I use this code to try and add a file to the
> distributed cache:****
>
>  ****
>
> import java.net.URI;****
>
> import org.apache.hadoop.conf.Configuration;****
>
> import org.apache.hadoop.filecache.DistributedCache;****
>
> import org.apache.hadoop.fs.*;****
>
> import org.apache.hadoop.io.*;****
>
> import org.apache.hadoop.mapreduce.*;****
>
> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;****
>
> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;****
>
>  ****
>
> Configuration conf = new Configuration();****
>
> DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);****
>
> Job job = Job.getInstance(); ****
>
> …****
>
>  ****
>
> However, I keep getting warnings that the method addCacheFile() is
> deprecated.****
>
> Is there a more current way to add files to the distributed cache?****
>
>  ****
>
> Thanks in advance,****
>
>  ****
>
> Andrew****
>
>  ****
>
> ** **
>

Mime
View raw message