hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerry Lam <chiling...@gmail.com>
Subject produce a large sequencefile (1TB)
Date Mon, 19 Aug 2013 22:09:08 GMT
Hi Hadoop users and developers,

I have a use case that I need produce a large sequence file of 1 TB in size
when each datanode has  200GB of storage but I have 30 datanodes.

The problem is that no single reducer can hold 1TB of data during the
reduce phase to generate a single sequence file even I use aggressive
compression. Any datanode will run out of space since this is a single
reducer job.

Any comment and help is appreciated.

Jerry

Mime
View raw message