crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 陈竞 <cj.mag...@gmail.com>
Subject Re: temporary table size is 0, which makes reducer number too small
Date Tue, 18 Oct 2016 07:30:57 GMT
issue: crunch-624.

link:
https://issues.apache.org/jira/browse/CRUNCH-624?jql=project%20%3D%20CRUNCH

2016-10-18 13:54 GMT+08:00 Josh Wills <josh.wills@gmail.com>:

> Yep, that's right-- can you file a JIRA, and I'll post the patch?
>
> On Mon, Oct 17, 2016 at 10:52 PM, 陈竞 <cj.magina@gmail.com> wrote:
>
>> i may found the root cause in my case:
>>
>> public void materializeAt(SourceTarget<S> sourceTarget) {
>>   this.materializedAt = sourceTarget;
>>   this.size = materializedAt.getSize(getPipeline().getConfiguration());
>> }
>>
>>
>> @Override
>> public long getSize() {
>>     if (size < 0) {
>>         this.size = getSizeInternal();
>>     }
>>     return size;
>> }
>>
>> PColletionImpl.materializeAt(sourceTarget) this method will be invoked
>> when node splits to create temporary table, source sourceTarget binds
>> with the new temporary table whose size is 0, since its path was just
>> created, the this.size will be 0. After that, when getSize() was invoked by
>> setting reduce number, since the size is 0, it will just return 0, which
>> makes reduce number too small.
>>
>> So i think the code of materializeAt() should check sourceTarget's size, like below:
>>
>> public void materializeAt(SourceTarget<S> sourceTarget) {
>>   this.materializedAt = sourceTarget;
>>   long size = materializedAt.getSize(getPipeline().getConfiguration());
>>
>>   if (size > 0)
>>
>>       this.size = size;
>>
>> }
>>
>>
>>
>> 2016-10-17 11:19 GMT+08:00 David Ortiz <dpo5003@gmail.com>:
>>
>>> That gets tricky if you have input data that is heavily filtered
>>> though.  Perhaps play around with the scale factor on operations that may
>>> blow up data?
>>>
>>> On Sun, Oct 16, 2016, 10:04 PM 陈竞 <cj.magina@gmail.com> wrote:
>>>
>>>> that's  a solution, but, since user may not clearly know whic step will
>>>> produce tempoary table, i think setting reduce number  automatically will
>>>> improve user experience. I think maybe we can set reduce number as 1/3
>>>> mapper number before submitting jobs if one of the job inputs is temporary
>>>> table.
>>>>
>>>> 2016-10-14 18:59 GMT+08:00 David Ortiz <dpo5003@gmail.com>:
>>>>
>>>> You can manually set the reducer number using the conf object among
>>>> other things.
>>>>
>>>> On Fri, Oct 14, 2016, 5:43 AM 陈竞 <cj.magina@gmail.com> wrote:
>>>>
>>>> hi, i found that if the pipeline produce temporary table , the reduce
>>>> number of the temporary table whose input table is temporary table  become
>>>> to small, since temporary table has no content .
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> 陈竞,中科院计算技术研究所,高性能计算机中心
>>>> Jing Chen HPCC.ICT.AC China
>>>>
>>>
>>
>>
>> --
>> 陈竞,中科院计算技术研究所,高性能计算机中心
>> Jing Chen HPCC.ICT.AC China
>>
>
>


-- 
陈竞,中科院计算技术研究所,高性能计算机中心
Jing Chen HPCC.ICT.AC China

Mime
View raw message