Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 470C1106FA for ; Wed, 10 Jul 2013 22:47:49 +0000 (UTC) Received: (qmail 15570 invoked by uid 500); 10 Jul 2013 22:47:44 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 15438 invoked by uid 500); 10 Jul 2013 22:47:44 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 15431 invoked by uid 99); 10 Jul 2013 22:47:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Jul 2013 22:47:44 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ojoshi@hortonworks.com designates 209.85.215.53 as permitted sender) Received: from [209.85.215.53] (HELO mail-la0-f53.google.com) (209.85.215.53) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Jul 2013 22:47:38 +0000 Received: by mail-la0-f53.google.com with SMTP id fs12so6174656lab.26 for ; Wed, 10 Jul 2013 15:47:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=cwkYezLqet+bbno9jM1RfHWuaGcGW7+95rCRh1YO0Sk=; b=kMZXSDEM4Lj0ybZR56eZvRnDiyuGzgGNIlA7P+jf5GBgdA55ELKDQIpzPIwYCyEBZk zdWxXN2Ei2GL4bekpudzZvA4xRrHsQcJFu1letv9ha976j+2xhvHCIrjhh/4IHMiLkoe EEdKACz5gMSjXL4TUQ4tW+ai5f9xW4wIh2WwIDNYrXN+23MHxJTTO/L7mZL6EN0rLuGO pwEzaLxu3iiI5jHO37309hSRMdh99H1e+hQFLctYaVF3yngiFlxBPtIcxuTrhfgxxl5T fLDn7P5L2mN7N24zyknK3+fnBtf2bQvpo+S9CLUit0xUpUN884c+Mt6DYkOvU7S1LMF9 agwQ== MIME-Version: 1.0 X-Received: by 10.152.19.40 with SMTP id b8mr15855358lae.34.1373496437641; Wed, 10 Jul 2013 15:47:17 -0700 (PDT) Received: by 10.112.26.50 with HTTP; Wed, 10 Jul 2013 15:47:17 -0700 (PDT) In-Reply-To: References: Date: Wed, 10 Jul 2013 15:47:17 -0700 Message-ID: Subject: Re: Distributed Cache From: Omkar Joshi To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e01493b1644296004e1300fff X-Gm-Message-State: ALoCoQmsLuJXelfP969axSL4avGrd3cWaOrbVnnJ7kL5yzaGnZsE9u0HtQ+UNzA9p2Erfjyxxjd5 X-Virus-Checked: Checked by ClamAV on apache.org --089e01493b1644296004e1300fff Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Path[] cachedFilePaths =3D DistributedCache.getLocalCacheFiles(context.getConfiguration()); for (Path cachedFilePath : cachedFilePaths) { File cachedFile =3D new File(cachedFilePath.toUri().getRawPath()); System.out.println("cached fie path >> " + cachedFile.getAbsolutePath()); } I hope this helps for the time being.. JobContext was suppose to replace DistributedCache api (it will be deprecated) however there is some problem with that or I am missing something... Will reply if I find the solution to it. getCacheFiles will give you the uri used for localizing files... (original uri used for adding it to cache). getLocalCacheFiles .. will give you the actual file path on node manager. Thanks, Omkar Joshi *Hortonworks Inc.* On Wed, Jul 10, 2013 at 2:43 PM, Botelho, Andrew wr= ote: > Ok so JobContext.getCacheFiles() retures URI[].**** > > Let=92s say I only stored one folder in the cache that has several .txt > files within it. How do I use that returned URI to read each line of tho= se > .txt files?**** > > ** ** > > Basically, how do I read my cached file(s) after I call > JobContext.getCacheFiles()?**** > > ** ** > > Thanks,**** > > ** ** > > Andrew**** > > ** ** > > *From:* Omkar Joshi [mailto:ojoshi@hortonworks.com] > *Sent:* Wednesday, July 10, 2013 5:15 PM > > *To:* user@hadoop.apache.org > *Subject:* Re: Distributed Cache**** > > ** ** > > try JobContext.getCacheFiles()**** > > > **** > > Thanks,**** > > Omkar Joshi**** > > *Hortonworks Inc.* **** > > ** ** > > On Wed, Jul 10, 2013 at 6:31 AM, Botelho, Andrew > wrote:**** > > Ok using job.addCacheFile() seems to compile correctly.**** > > However, how do I then access the cached file in my Mapper code? Is ther= e > a method that will look for any files in the cache?**** > > **** > > Thanks,**** > > **** > > Andrew**** > > **** > > *From:* Ted Yu [mailto:yuzhihong@gmail.com] > *Sent:* Tuesday, July 09, 2013 6:08 PM > *To:* user@hadoop.apache.org > *Subject:* Re: Distributed Cache**** > > **** > > You should use Job#addCacheFile()**** > > > Cheers**** > > On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew > wrote:**** > > Hi,**** > > **** > > I was wondering if I can still use the DistributedCache class in the > latest release of Hadoop (Version 2.0.5).**** > > In my driver class, I use this code to try and add a file to the > distributed cache:**** > > **** > > import java.net.URI;**** > > import org.apache.hadoop.conf.Configuration;**** > > import org.apache.hadoop.filecache.DistributedCache;**** > > import org.apache.hadoop.fs.*;**** > > import org.apache.hadoop.io.*;**** > > import org.apache.hadoop.mapreduce.*;**** > > import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;**** > > import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;**** > > **** > > Configuration conf =3D new Configuration();**** > > DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);**** > > Job job =3D Job.getInstance(); **** > > =85**** > > **** > > However, I keep getting warnings that the method addCacheFile() is > deprecated.**** > > Is there a more current way to add files to the distributed cache?**** > > **** > > Thanks in advance,**** > > **** > > Andrew**** > > **** > > ** ** > --089e01493b1644296004e1300fff Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable

=A0 =A0 =A0=A0Path[] cachedFilePaths =3D

=A0 =A0 =A0 =A0 =A0 DistributedCache.= getLocalCacheFiles(context.getConf= iguration());

=A0 =A0 =A0 for (Path cachedFilePath = : cachedFilePaths) {

=A0 =A0 =A0 =A0 File cachedFile =3D new File(cachedFilePath.toUri().getRawPath());

=A0 =A0 =A0 =A0 System.out.println("cached fie path >>= ; "

=A0 =A0 =A0 =A0 =A0 =A0 + cachedFile.getAbsolutePath());

=A0 =A0 =A0 }

I hope this helps for th= e time being.. JobContext was suppose to replace DistributedCache api (it w= ill be deprecated) however there is some problem with that or I am missing = something... Will reply if I find the solution to it.

getCacheFiles will give you the uri used for localizing= files... (original uri used for adding it to cache).

getLocalCacheFiles .. will give you the actual file path on node manager.=


Thanks,
Omkar Joshi


On Wed, Jul 10, 2013 at 2:43 PM, Botelho= , Andrew <Andrew.Botelho@emc.com> wrote:

Ok so JobContext.getCacheFiles() retures URI[].

=

Let=92s say I only stored one folder in the cache that has several = .txt files within it.=A0 How do I use that returned URI to read each line o= f those .txt files?

=A0

Bas= ically, how do I read my cached file(s) after I call JobContext.getCacheFil= es()?

=A0

Tha= nks,

=A0

And= rew

=A0

From: Omkar Joshi [mailto= :ojoshi@hortonw= orks.com]
Sent: Wednesday, July 10, 2013 5:15 PM


To: user@hadoop.apache.org
Subject: Re: Distributed Cache<= /u>

=A0

try=A0JobContext.getCacheFiles()


Thanks,

Omkar Joshi=

=A0

On Wed, Jul 10, 2013 at 6:31 AM, Bo= telho, Andrew <Andrew.Botelho@emc.com> wrote:

Ok using job.addCacheFile() seems to compile correctly.

However, how do I then access the cached= file in my Mapper code?=A0 Is there a method that will look for any files = in the cache?

=A0

Tha= nks,

=A0

And= rew

=A0

From: Ted Yu [mailto:yuzhihong@gmail.com]
Sent: Tuesday, July 09, 2013 6:08 PM
To:
user@hadoop.apache.org
Subject: Re: Distributed Cache

=A0

You should use Job#addCacheFile()=


Cheers<= /u>

On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew &= lt;Andrew.Botel= ho@emc.com> wrote:

Hi,

=A0

I was wondering if I can still use the DistributedCache = class in the latest release of Hadoop (Version 2.0.5).

In my driver class, I use this code to try and add a file to the distribute= d cache:

=A0

= import java.net.URI;

import org.apache.hadoo= p.conf.Configuration;

import org.apache.hadoop.filecache.DistributedCache;

import org.apache.hadoop.fs.*;

import org.apache.hadoop.io.*;

imp= ort org.apache.hadoop.mapreduce.*;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;=

import org.apache.hadoop.mapreduce.lib.outp= ut.FileOutputFormat;

=A0

Configuration conf =3D new Configuration();

= DistributedCache.addCacheFile(new URI("file path in HDFS"), conf)= ;

Job job =3D Job.getInstance(); <= /u>

=85

=A0

However, I keep getting warnings that the method addCacheFile() is= deprecated.

Is there a more current way to = add files to the distributed cache?

=A0

Thanks in advance,<= u>

=A0

Andrew=

=A0

=A0


--089e01493b1644296004e1300fff--