hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kartashov, Andy" <Andy.Kartas...@mpac.ca>
Subject block-size vs split-size
Date Tue, 27 Nov 2012 15:05:45 GMT

I understand that if not specified, default block size of HDFs is 64Mb. You can control this
value by altering dfs.block.size property and increasing to value to 64Mb x 2 or 64Mb x 4..
Every time we make the change to this property we must reimport the data for the changes to
take effect

My question is about split size. I understand it is used by MapReduce to assign task to tasktrackers:

1.       Is split size (if not specified in the property mapred.min.split.size) by default
equals to the  default block-size or 64Mb?

2.       If you increased block size say to 128Mb? Will split size (if not specified) be equal
to 128Mb of blocks size or will it remain at 64Mb of default block size?

3.       If, say, your input file is 128Mb. At default 64Mb block-size you will get the file
totalling two blocks. Your JobTracker will create two map masks - one per block)... but what
if you specified mapred.min.split.size at 128Mb? I suppose, despite the fact there are 2 blocks
in the HDFS there will be only one input split for the Mapreduce. Is mapred.min.split.size
designed to override the block-size property when preparing inputSplits?

4.       Given the ability to set  block-size and split - size individually,  was the main
purpose of this to gain better control of one property over another??  If my understanding
on the 3rd point correct. Then, say, you imported  your data at 128Mb block-size but later
realised you should have gone higher... instead of re-importing all your data at 256Mb per
block, you can change split size property to 256Mb. Am I grasping this concept correctly?

Please help.

NOTICE: This e-mail message and any attachments are confidential, subject to copyright and
may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not
the intended recipient, please delete and contact the sender immediately. Please consider
the environment before printing this e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe
qui l'accompagne sont confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts
par le secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est interdite.
Si vous n'?tes pas le destinataire pr?vu de ce courriel, supprimez-le et contactez imm?diatement
l'exp?diteur. Veuillez penser ? l'environnement avant d'imprimer le pr?sent courriel

View raw message