crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-519) Plan dot file can display more infromation
Date Fri, 22 May 2015 03:04:17 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555500#comment-14555500
] 

Micah Whitacre commented on CRUNCH-519:
---------------------------------------

We should also add some tests around the labeling of reducers and count (don't see that in
the latest patch)

Regarding maintaining state, do we really need to?  The two methods that were added could
just as easily be something like:

{code}
  public void configureShuffle(Job job) {
    ptype.configureShuffle(job, groupingOptions);
    if (!isNumReduceTasksSetByUser()) {
      int numReduceTasks = getNumReduceTasks();
      if (numReduceTasks > 0) {
        job.setNumReduceTasks(numReduceTasks);
        LOG.info("Setting num reduce tasks to {}", numReduceTasks);
      } else {
        LOG.warn("Attempted to set a negative number of reduce tasks");
      }
    }
  }

public int getNumReduceTasks() {
  if (groupingOptions == null || groupingOptions.getNumReducers() <= 0) {
    numReduceTasks = PartitionUtils.getRecommendedPartitions(this, getPipeline().getConfiguration());
   } else {
     numReduceTasks = groupingOptions.getNumReducers();
   }
  return numReduceTasks;
}

 public boolean isNumReduceTasksSetByUser() {
    return (groupingOptions == null || groupingOptions.getNumReducers() <= 0);
  }
{code}

The PartitionUtils method call is actually pretty cheap.

Regarding labeling manual vs automatic.  Probably would be good to have a clearer label and
A & M.  Specifically in the case of manual would be nice to avoid someone seeing "3 M"
or "3M" and thinking 3 million instead of "3 Manual".


> Plan dot file can display more infromation
> ------------------------------------------
>
>                 Key: CRUNCH-519
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-519
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Ron Hashimshony
>            Assignee: Josh Wills
>         Attachments: CRUNCH-519-1.diff, CRUNCH-519-2.patch, CRUNCH-519.diff
>
>
> The current plan dot file display nicely the jobs, with nice names and arrows.
> However it does not explain how the planner decided on the reducers number, which is
based on the input data size, scale factor and desired size per reducer.
> I suggest adding this information to the dot file.
> An addition to the DotfileWriter class can do this easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message