crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Tzolov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-438) Visualizations of some important internal/intermediate pipeline planning states
Date Mon, 07 Jul 2014 13:10:34 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14053632#comment-14053632
] 

Christian Tzolov commented on CRUNCH-438:
-----------------------------------------

h5. Some unresolved topics:
# The BASE_GRAPH_PLANE_DOTFILE and SPLIT_GRAPH_PLANE_DOTFILE diagrams are generated inside
the while loop in the MSCRPlanner#plan() method:
{code:title=MSCRPlanner.java|borderStyle=solid}
public MRExecutor plan(...)  {
   ...
   while (!targetDeps.isEmpty()) {
      ...
     Create BASE_GRAPH_PLANE_DOTFILE
     Create SPLIT_GRAPH_PLANE_DOTFILE
     ...
   }
   ...
}
{code}
The current implementation will register only the graphs from the last iteration! 
# For the RTNode diagram (RTNODES_PLAN_DOTFILE) if have not figured out how to connect the
dependent jobs.
The MRJob#getDependentJobs() returns a list of dependent jobs, but it is not clear which output
to which input to wire. The wire logic should repeat the exact logic in the code. If not mistaken
the wire info has to be retrieved from the job Competition Hook attributes.   
# Thinking about a way to abstract the tracing logic/code (e.g. dotfiles) from the main code
 I've been thinking of trace interface (below). One or more implementation would be registered
with the planner and notified on event.  
{code:title=PlannerTracker.java|borderStyle=solid}
interface PlaneTracker {
   void onPCollectionPlan(String name, Map<PCollection<?>, Set<Target>>
outputs);
   void onBaseGraphPlan(String name, Graph graph, Map<PCollection<?>, Set<Target>>
outputs);
   void onSplitGraphPlan(String name, Graph graph, Map<PCollection<?>, Set<Target>>
outputs, List<List<Vertex>> components);
   void onRunTimeConfiguration(String name, List<MRJob> jobs);
   void onPipelinePlane(String name, List<JobPrototype> protos);
}
{code}
This is pretty rough but hopefully will help to start the discussion


> Visualizations of some important internal/intermediate pipeline planning states
> -------------------------------------------------------------------------------
>
>                 Key: CRUNCH-438
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-438
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.10.0, 0.8.3
>            Reporter: Christian Tzolov
>            Assignee: Christian Tzolov
>         Attachments: CRUNCH-438.2.patch, CRUNCH-438.patch
>
>
> To improve the understability of the pipeline planning stages it would help to visualize
some intermediate planning states like:
> - PCollection lineage. (visualizing the output-pcollection-targets structure) 
> - MSCRPlanner's planning Graphs before and after the split up of dependent GBK nodes
> - RTNode hierarchy along with the Input and Output configurations as persistent in the
Configuration before the execution of the pipeline. 
> Most of the information can be intercepted in the MSCRPlanner#plan()  method.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message