hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject HDFS data and non-aligned splits
Date Thu, 23 May 2013 17:53:17 GMT
What happens when MR produces data splits, and those splits don't align on block boundaries?
 I've read that MR will attempt to make data splits near block boundaries to improve data
locality, but isn't there always some slop where records straddle the block boundaries, resulting
in an extra HDFS connection just to get the half-record in the other block?  Does this impact
performance?  Are there file formats that attempt to enforce data alignment?


Mime
View raw message