Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9D9D2EFF2 for ; Thu, 22 Nov 2012 20:51:46 +0000 (UTC) Received: (qmail 94559 invoked by uid 500); 22 Nov 2012 20:51:42 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 94444 invoked by uid 500); 22 Nov 2012 20:51:41 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 94437 invoked by uid 99); 22 Nov 2012 20:51:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Nov 2012 20:51:41 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates 209.85.217.176 as permitted sender) Received: from [209.85.217.176] (HELO mail-lb0-f176.google.com) (209.85.217.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Nov 2012 20:51:36 +0000 Received: by mail-lb0-f176.google.com with SMTP id k6so7269796lbo.35 for ; Thu, 22 Nov 2012 12:51:15 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding:x-gm-message-state; bh=QXdCHeNauVGppXIR3N3N9SopxfFp2Q1grYrbCWLbHns=; b=alu286G0rfAPPMoG5Qh7pTJjBFACaIlrv/LM61u+uIwN5H49vdOrff2+B9nz0Kupmk kdJ/Z1U02pXXrX3nAvQgs9ZdvM8IlunAxq+RFSRFip9MjWK0s6s5XwVlDzkZ6beR+Jo7 6lOnYtzBMsATKUYzzOnPyO91eRfM+lUq0CXJr9xwDrEkwTQIcqupTwFjbovejgoE8+6J jDePKTP930wpOTeqGNphn7q/zo5rO/XqLZc6zGTsq3iTSRFb+D0p3+/aWwLsWYH4Wpeb dT0JPUJyrfGokFxhq8u5ip2lHGOOjAgaORR0i+PDHn4Su3z2l7435FVK9WSY+/CtBYyN 9uaw== Received: by 10.152.108.37 with SMTP id hh5mr1454608lab.52.1353617474882; Thu, 22 Nov 2012 12:51:14 -0800 (PST) MIME-Version: 1.0 Received: by 10.112.23.104 with HTTP; Thu, 22 Nov 2012 12:50:54 -0800 (PST) In-Reply-To: References: From: Harsh J Date: Fri, 23 Nov 2012 02:20:54 +0530 Message-ID: Subject: Re: FileNotFoundExcepion when getting files from DistributedCache To: "" Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQm51BVGB7plGEsNdhMeF5JxQMpvGG6Vxy2yTqN/LZGlpa8ZpYGNEzelDL1nMGyK3GxY0SbB X-Virus-Checked: Checked by ClamAV on apache.org DistributedCache files in tasks are located locally (not on HDFS), so use the LocalFileSystem or java.io.File if you prefer that, to read them from within tasks. On Fri, Nov 23, 2012 at 2:16 AM, Barak Yaish wrote: > Thanks for the quick response. > > I wanted to use DistributedCache to localized the files in interest to al= l > nodes, so which API should I use in order to be able to read all those > files, regardless the node running the mapper? > > > On Thu, Nov 22, 2012 at 10:38 PM, Harsh J wrote: >> >> You pointed that you use: >> >> FSDataInputStream fs =3D FileSystem.get( context.getConfiguration() ).op= en( >> path ) >> >> Note that this (FileSystem.get) will return back a HDFS FileSystem by >> default and your path is a local one. You can either use simple >> java.io.File APIs or use >> FileSystem.getLocal(context.getConfiguration()) [1] to get a local >> filesystem handle that can look in file:/// FSes rather than hdfs:// >> paths. >> >> [1] >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSyste= m.html#getLocal(org.apache.hadoop.conf.Configuration) >> >> On Fri, Nov 23, 2012 at 2:04 AM, Barak Yaish >> wrote: >> > Hi, >> > >> > I=92ve 2 nodes cluster (v1.04), master and slave. On the master, in >> > Tool.run() >> > we add two files to the DistributedCache using addCacheFile(). Files d= o >> > exist in HDFS. In the Mapper.setup() we want to retrieve those files >> > from >> > the cache using FSDataInputStream fs =3D FileSystem.get( >> > context.getConfiguration() ).open( path ). The problem is that for one >> > file >> > a FileNotFoundException is thrown, although the file exists on the sla= ve >> > node: >> > >> > attempt_201211211227_0020_m_000000_2: java.io.FileNotFoundException: >> > File >> > does not exist: >> > >> > /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990= 780/master/tmp/analytics/1.csv >> > >> > ls =96l on the slave: >> > >> > [hduser@slave ~]$ ll >> > >> > /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990= 780/master/tmp/ >> > analytics/1.csv >> > -rwxr-xr-x 1 hduser hadoop 42701 Nov 22 10:18 >> > >> > /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990= 780/master/tmp/analytics/1.csv >> > [hduser@slave ~]$ >> > >> > My questions are: >> > >> > Shouldn't all files exist on all nodes? >> > What should be done to fix that? >> > >> > Thanks. >> >> >> >> -- >> Harsh J > > --=20 Harsh J