hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chaman Singh Verma <csv...@yahoo.com>
Subject One Simple Question About Hadoop DFS
Date Sun, 23 Mar 2008 09:00:18 GMT

I am exploring Hadoop and MapReduce and I have one very simple question.

I have 500GB dataset on my local disk and I have written both Map-Reduce functions. Now how
should I start ?

1.  I copy the data from local disk to DFS. I have configured DFS with 100 machines. I hope
that it will split the file on 100 nodes ( With some replications).

2. For MapReduce should I specify 100 nodes for SetMaxMapTask(). If I specify
   less than 100 then, will be blocks migrate ? If the blocks don't migrate then
   why this functions is provided to the users ? Why number of Tasks is not 
   taken from the startup script ?

3.  If I specify more than 100, then will load balancing be done automatically
    or user have to specify that also.

Perhaps these are very simple questions, but I think that MapReduce simplifies lots of things
( Compared to MPI Based Programming ) that for beginners like me have difficult time to understand
the model.


Never miss a thing.   Make Yahoo your homepage.
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message