hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Panshul Whisper <ouchwhis...@gmail.com>
Subject too many memory spills
Date Wed, 06 Mar 2013 14:28:58 GMT

I have a file of size 9GB and having approximately 109.5 million records.
I execute a pig script on this file that is doing:
1. Group by on a field of the file
2. Count number of records in every group
3. Store the result in a CSV file using normal PigStorage(",")

The job is completed successfully but the job details show a lot of memory
spills. *Out of 109.5 million records, it shows approximately 48 million
records spilled.*

I am executing it on a* 4 node cluster each with a dual core processor and
4GB ram*.

How can I minimize the amount of record spills. It really makes the
execution really slow when the spilling starts.

Any suggestions are welcome.

Thanking You,

Ouch Whisper

View raw message