hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sundeep Kambhampati <kambh...@cse.ohio-state.edu>
Subject Difference between HDFS and local filesystem
Date Sat, 26 Jan 2013 15:49:13 GMT
Hi Users,
I am kind of new to MapReduce programming I am trying to understand the 
integration between MapReduce and HDFS.
I could understand MapReduce can use HDFS for data access. But is 
possible not to use HDFS at all and run MapReduce programs?
HDFS does file replication and partitioning. But if I use the following 
command to run the Example MaxTemperature

  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature 
file:///usr/local/ncdcinput/sample.txt file:///usr/local/out4

instead of

  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature 
usr/local/ncdcinput/sample.txt usr/local/out4     ->> this will use hdfs 
file system.

it uses local file system files and writing to local file system when I 
run in pseudo distributed mode. Since it is single node there is no 
problem of non local data.
What happens in a fully distributed mode. Will the files be copied to 
other machines or will it throw errors? will the files be replicated and 
will they be partitioned for running MapReduce if i use Localfile system?

Can someone please explain.


View raw message