hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Difference between HDFS and local filesystem
Date Sat, 26 Jan 2013 15:55:23 GMT
The local filesystem has no sense of being 'distributed'. If you run a
distributed mode of Hadoop over file:// (Local FS), then unless the
file:// points being used itself is distributed (such as an NFS), then
your jobs will fail their tasks on all the nodes the referenced files
cannot be found on.

Essentially, for a distributed operation, MR relies on a distributed
file system and local filesystem is opposite of that.

On Sat, Jan 26, 2013 at 9:19 PM, Sundeep Kambhampati
<kambhamp@cse.ohio-state.edu> wrote:
> Hi Users,
> I am kind of new to MapReduce programming I am trying to understand the
> integration between MapReduce and HDFS.
> I could understand MapReduce can use HDFS for data access. But is possible
> not to use HDFS at all and run MapReduce programs?
> HDFS does file replication and partitioning. But if I use the following
> command to run the Example MaxTemperature
>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
> file:///usr/local/ncdcinput/sample.txt file:///usr/local/out4
> instead of
>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
> usr/local/ncdcinput/sample.txt usr/local/out4     ->> this will use hdfs
> file system.
> it uses local file system files and writing to local file system when I run
> in pseudo distributed mode. Since it is single node there is no problem of
> non local data.
> What happens in a fully distributed mode. Will the files be copied to other
> machines or will it throw errors? will the files be replicated and will they
> be partitioned for running MapReduce if i use Localfile system?
> Can someone please explain.
> Regards
> Sundeep

Harsh J

View raw message