hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: FileNotFoundExcepion when getting files from DistributedCache
Date Thu, 22 Nov 2012 20:50:54 GMT
DistributedCache files in tasks are located locally (not on HDFS), so
use the LocalFileSystem or java.io.File if you prefer that, to read
them from within tasks.

On Fri, Nov 23, 2012 at 2:16 AM, Barak Yaish <barak.yaish@gmail.com> wrote:
> Thanks for the quick response.
>
> I wanted to use DistributedCache to localized the files in interest to all
> nodes, so which API should I use in order to be able to read all those
> files, regardless the node running the mapper?
>
>
> On Thu, Nov 22, 2012 at 10:38 PM, Harsh J <harsh@cloudera.com> wrote:
>>
>> You pointed that you use:
>>
>> FSDataInputStream fs = FileSystem.get( context.getConfiguration() ).open(
>> path )
>>
>> Note that this (FileSystem.get) will return back a HDFS FileSystem by
>> default and your path is a local one. You can either use simple
>> java.io.File APIs or use
>> FileSystem.getLocal(context.getConfiguration()) [1] to get a local
>> filesystem handle that can look in file:/// FSes rather than hdfs://
>> paths.
>>
>> [1]
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getLocal(org.apache.hadoop.conf.Configuration)
>>
>> On Fri, Nov 23, 2012 at 2:04 AM, Barak Yaish <barak.yaish@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I’ve 2 nodes cluster (v1.04), master and slave. On the master, in
>> > Tool.run()
>> > we add two files to the DistributedCache using addCacheFile(). Files do
>> > exist in HDFS. In the Mapper.setup() we want to retrieve those files
>> > from
>> > the cache using FSDataInputStream fs = FileSystem.get(
>> > context.getConfiguration() ).open( path ). The problem is that for one
>> > file
>> > a FileNotFoundException is thrown, although the file exists on the slave
>> > node:
>> >
>> > attempt_201211211227_0020_m_000000_2: java.io.FileNotFoundException:
>> > File
>> > does not exist:
>> >
>> > /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990780/master/tmp/analytics/1.csv
>> >
>> > ls –l on the slave:
>> >
>> > [hduser@slave ~]$ ll
>> >
>> > /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990780/master/tmp/
>> > analytics/1.csv
>> > -rwxr-xr-x 1 hduser hadoop 42701 Nov 22 10:18
>> >
>> > /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990780/master/tmp/analytics/1.csv
>> > [hduser@slave ~]$
>> >
>> > My questions are:
>> >
>> > Shouldn't all files exist on all nodes?
>> > What should be done to fix that?
>> >
>> > Thanks.
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Mime
View raw message