hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Utkarsh Gupta <Utkarsh_Gu...@infosys.com>
Subject How HDFS divides Files into block
Date Fri, 18 May 2012 09:10:19 GMT

I have a doubt about HDFS which may be a very trivial thing but I am not able to understand

Since hdfs keeps the files in block of 64/128 MB how does HDFS splits files?
The problem which I see is that suppose I have a long string in my input file as:


This is to be processed in one map call. But because of blocks a part of this line is in one
block and next in another.

-                                                                              this block
goes to one mapper process
<end of block1>

-                                                                              this block
goes to another mapper process

How HDFS avoids this scenario?

Thanks and Regards
Utkarsh Gupta

**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are not 
to copy, disclose, or distribute this e-mail or its contents to any other person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has taken 
every reasonable precaution to minimize this risk, but is not liable for any damage 
you may sustain as a result of any virus in this e-mail. You should carry out your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***

View raw message