hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Changing the Java heap
Date Thu, 26 Apr 2012 20:56:14 GMT
Not sure of your question. 

Java child Heap size is independent of how files are split on HDFS. 

I suggest you look at Tom White's book on HDFS and how files are split in to blocks. 

Blocks are split on set sizes. 64MB by default. 
Your record boundaries are not necessarily on file block boundaries so one process may read
the rest of the last record in block A and then complete reading it at the start of block
B. A different task may start with block B and skip the first n bytes until it hits the start
of a record. 



On Apr 26, 2012, at 3:46 PM, Barry, Sean F wrote:

> Within my small 2 node cluster I set up my 4 core slave node to have 4 task trackers
and I also limited my java heap size to -Xmx1024m
> Is there a possibility that when the data gets broken up that it will break it at a place
in the file that is not a whitespace? Or is that already handled when the data on HDFS is
broken up into blocks?
> -SB

View raw message