hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bing Jiang <jiangbinglo...@gmail.com>
Subject Re: produce a large sequencefile (1TB)
Date Tue, 20 Aug 2013 02:55:26 GMT
Hi Jerry,

I think whether it is acceptable to set multiple reducers to generate more
MapFile(IndexFile, DataFile)s.

I want to know the real difficulties of multiply reducer to
post-processing. Maybe there are some questions about app?

2013/8/20 Jerry Lam <chilinglam@gmail.com>

> Hi Bing,
> you are correct. The local storage does not have enough capacity to hold
> the temporary files generated by the mappers. Since we want a single
> sequence file at the end, we are forced to use 1 reducer.
> The use case is that we want to generate an index for the 1TB sequence
> file that we can randomly access each row in the sequence file. In
> practice, this is simply a MapFile.
> Any idea how to resolve this dilemma is greatly appreciated.
> Jerry
> On Mon, Aug 19, 2013 at 8:14 PM, Bing Jiang <jiangbinglover@gmail.com>wrote:
>> hi,Jerry.
>> I think you are worrying about the volumn of mapreduce local file, but
>> would  you give us more details about your apps.
>>  On Aug 20, 2013 6:09 AM, "Jerry Lam" <chilinglam@gmail.com> wrote:
>>> Hi Hadoop users and developers,
>>> I have a use case that I need produce a large sequence file of 1 TB in
>>> size when each datanode has  200GB of storage but I have 30 datanodes.
>>> The problem is that no single reducer can hold 1TB of data during the
>>> reduce phase to generate a single sequence file even I use aggressive
>>> compression. Any datanode will run out of space since this is a single
>>> reducer job.
>>> Any comment and help is appreciated.
>>> Jerry

Bing Jiang
weibo: http://weibo.com/jiangbinglover
BLOG: www.binospace.com
BLOG: http://blog.sina.com.cn/jiangbinglover
Focus on distributed computing, HDFS/HBase

View raw message