crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CRUNCH-624) temporary table size is 0, which makes reducer number too small
Date Fri, 21 Oct 2016 19:07:58 GMT

     [ https://issues.apache.org/jira/browse/CRUNCH-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Josh Wills updated CRUNCH-624:
------------------------------
    Attachment: CRUNCH-624.patch

Patch for this.

> temporary table size is 0, which makes reducer number too small
> ---------------------------------------------------------------
>
>                 Key: CRUNCH-624
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-624
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>            Reporter: JingChen
>            Assignee: Josh Wills
>         Attachments: CRUNCH-624.patch
>
>
> if the pipeline produce temporary table , the reduce number of the temporary table whose
input table is temporary table may become very small in some cases, since temporary table
has no content .
> And, I may found the root cause in my caseļ¼š
> {code:title=PCollectionImpl.java|borderStyle=solid}
> public void materializeAt(SourceTarget<S> sourceTarget) {
>   this.materializedAt = sourceTarget;
>   this.size = materializedAt.getSize(getPipeline().getConfiguration());
> }
> @Override
> public long getSize() {
>     if (size < 0) {
>         this.size = getSizeInternal();
>     }
>     return size;
> }
> {code}
> PColletionImpl.materializeAt(sourceTarget) this method will be invoked when node splits
to create temporary table, source sourceTarget binds with the new temporary table whose size
is 0, since its path was just created, the this.size will be 0. After that, when getSize()
was invoked by setting reduce number, since the size is 0, it will just return 0, which makes
reduce number too small.
> So i think the code of materializeAt() should check sourceTarget's size, like below:
> {code:title=PCollectionImpl.java|borderStyle=solid}
> public void materializeAt(SourceTarget<S> sourceTarget) {
>   this.materializedAt = sourceTarget;
>   long size = materializedAt.getSize(getPipeline().getConfiguration());
>   if (size > 0)
>       this.size = size;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message