crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 陈竞 <cj.mag...@gmail.com>
Subject Re: temporary table size is 0, which makes reducer number too small
Date Tue, 18 Oct 2016 05:52:32 GMT
i may found the root cause in my case:

public void materializeAt(SourceTarget<S> sourceTarget) {
  this.materializedAt = sourceTarget;
  this.size = materializedAt.getSize(getPipeline().getConfiguration());
}


@Override
public long getSize() {
    if (size < 0) {
        this.size = getSizeInternal();
    }
    return size;
}

PColletionImpl.materializeAt(sourceTarget) this method will be invoked when
node splits to create temporary table, source sourceTarget binds with the
new temporary table whose size is 0, since its path was just created, the
this.size will be 0. After that, when getSize() was invoked by setting
reduce number, since the size is 0, it will just return 0, which makes
reduce number too small.

So i think the code of materializeAt() should check sourceTarget's
size, like below:

public void materializeAt(SourceTarget<S> sourceTarget) {
  this.materializedAt = sourceTarget;
  long size = materializedAt.getSize(getPipeline().getConfiguration());

  if (size > 0)

      this.size = size;

}



2016-10-17 11:19 GMT+08:00 David Ortiz <dpo5003@gmail.com>:

> That gets tricky if you have input data that is heavily filtered though.
> Perhaps play around with the scale factor on operations that may blow up
> data?
>
> On Sun, Oct 16, 2016, 10:04 PM 陈竞 <cj.magina@gmail.com> wrote:
>
>> that's  a solution, but, since user may not clearly know whic step will
>> produce tempoary table, i think setting reduce number  automatically will
>> improve user experience. I think maybe we can set reduce number as 1/3
>> mapper number before submitting jobs if one of the job inputs is temporary
>> table.
>>
>> 2016-10-14 18:59 GMT+08:00 David Ortiz <dpo5003@gmail.com>:
>>
>> You can manually set the reducer number using the conf object among other
>> things.
>>
>> On Fri, Oct 14, 2016, 5:43 AM 陈竞 <cj.magina@gmail.com> wrote:
>>
>> hi, i found that if the pipeline produce temporary table , the reduce
>> number of the temporary table whose input table is temporary table  become
>> to small, since temporary table has no content .
>>
>>
>>
>>
>> --
>> 陈竞,中科院计算技术研究所,高性能计算机中心
>> Jing Chen HPCC.ICT.AC China
>>
>


-- 
陈竞,中科院计算技术研究所,高性能计算机中心
Jing Chen HPCC.ICT.AC China

Mime
View raw message