hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shanmukhan battinapati <shanmukha...@gmail.com>
Subject HDFS Structure
Date Wed, 29 Dec 2010 04:57:10 GMT

I have a small doubt about the how  HDFS manages the files internally.

Assume like I have a NameNode and 2 DataNodes. I have inserted a csv file of
size 80MB into HDFS using 'hadoop copyFromLocal' command.

Then how this file will be stored in HDFS?

Will it be split into two parts of size 64MB(Default chunk size) and
remaining 16Mb and copied to the 2 DataNodes?

If that is the case, if I am doing some map-reduce on the two dataNodes, as
the data is not line oriented I may get unexpected results.

How to solve this type of issues? Please help me.

Thanks & Regards

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message