hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grandl Robert <rgra...@yahoo.com.INVALID>
Subject HDFS writes a lot to disks
Date Fri, 17 Apr 2015 22:39:36 GMT
I am running some PIG queries atop Tez atop Yarn. My PIG query has a large stage which reads
45 GB data, and outputs less than 1 MB.  The stage is processed by 200 tasks, on 9 machines
cluster with up to 8 tasks running in parallel, each with 7 GB memory. 

I am monitoring the resource usage by the job and I observe that the stage writes 53 GB data
to disk, which makes me to be confused as the intermediate data size is less than 1 MB. 

Do you guys have any idea what might be the reason ? It is possible that the processing code
in the tasks to actually write data to disk as part of the processing phase ?
Thank you,Robert

(PS: I am looking at IOSTAT counters, namely MB read and write)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message