hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From prasenjit mukherjee <prasen....@gmail.com>
Subject Awesome post on Hadoop. Some questions...
Date Mon, 12 Dec 2011 17:10:24 GMT
Really enthralled to read the post :
Great job done.

Some related questions:

1. The article says that hdfs always maintains 2 copies in the same
rack and 3rd in a different rack. This only speeds up the hdfs "put" (
fileCreation ) time. But wont it be better be to spread it across 3
racks ? What other advantage will it have for this 2+1 approach.

2.  In HDFS the client reads block sequentially. Why the clients cant
read the blocks parallel-y  ?  wont it speed up lookups from client's
perspective ?

3. There are some cases in which a Data Node daemon itself will need
to read a block of data from HDFS. When would a data node need to read
from other data nodes ? Is it  when split-size is more than block size
? Even in that case its the tasktracker which should ask for the data
and not the data node

Prasenjit .

View raw message