Return-Path: X-Original-To: apmail-incubator-crunch-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-crunch-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8D7B6D78A for ; Sat, 27 Oct 2012 13:40:26 +0000 (UTC) Received: (qmail 88146 invoked by uid 500); 27 Oct 2012 13:40:25 -0000 Delivered-To: apmail-incubator-crunch-dev-archive@incubator.apache.org Received: (qmail 88094 invoked by uid 500); 27 Oct 2012 13:40:25 -0000 Mailing-List: contact crunch-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: crunch-dev@incubator.apache.org Delivered-To: mailing list crunch-dev@incubator.apache.org Received: (qmail 88085 invoked by uid 99); 27 Oct 2012 13:40:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Oct 2012 13:40:25 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [212.227.17.9] (HELO moutng.kundenserver.de) (212.227.17.9) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Oct 2012 13:40:17 +0000 Received: from mafr.de (krlh-5f7264b9.pool.mediaWays.net [95.114.100.185]) by mrelayeu.kundenserver.de (node=mreu2) with ESMTP (Nemesis) id 0MHKKd-1TeSwe164h-00DxWI; Sat, 27 Oct 2012 15:39:51 +0200 Date: Sat, 27 Oct 2012 15:39:48 +0200 From: Matthias Friedrich To: crunch-dev@incubator.apache.org Subject: Re: Generating DOT files for Crunch job plans Message-ID: <20121027133948.GA8240@mafr.de> Mail-Followup-To: crunch-dev@incubator.apache.org References: <90B8B61F-E772-4C6B-8EB7-1AEB09D4B563@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <90B8B61F-E772-4C6B-8EB7-1AEB09D4B563@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Provags-ID: V02:K0:+61MLr84SJa+09hClqHySPj3bVgYq1XB5MeCjHffVFh Yj4csE15djPFgVWyAkznvMnHVVuMMYjvdKmNM+de9EzjcuHY6u bXmOSj2lHIUwKMHg4USf+geaAp28c1CEfZqFhjG1kF50DsGgh5 xSgwzXxJZzfGGb3DRctxaha5T23Eeig5mzE/nGsOVdWZmmQFlJ dCqGeM8YwGyufLKQalr19MmSP0V3WnywRF87BEiqNTI6gbzAXw ytdIh8P+1Di5qijaXwlh85/M2thezGXCcQddDjdxpoXirUYXuK JI6iPBTdj+SR448rpGZdKZmDNm7mITi12thVVQmgjazb2OqBZv 1VTH2QbkadINOUDdqPLw= X-Virus-Checked: Checked by ClamAV on apache.org Hi, On Saturday, 2012-10-27, Gabriel Reid wrote: > In the few times that I've debugged issues in the planner in Crunch, > it always takes me a bit of time to figure out (again) how things > work there. I've been thinking/planning of writing some more inline > docs and doing a bit of refactoring in the code to help myself (and > others) with doing this in the future, but something else that I was > thinking of was the generation of DOT[1] files for pipelines so that > it's easier to visualize what's going on. That's a great idea, it will help to win prospective users over who wonder whether Crunch's performs as well as a sequence of hand-written MR jobs. There are other ways in Java to generate graphs, BTW, but from my experience none of them produces output that matches dot/graphviz. In my opinion we shouldn't run dot ourselves though, because most users don't have dot installed. just generate the output and let users call dot themselves. > I'm sure that functionality like this can be useful (at least to me, > as I was just using it in a somewhat ad-hoc way to debug > CRUNCH-102), but I'm not sure if this is something we want to expose > easily, or keep pretty hidden to just use for debugging. I believe > Pig provides this same functionality with the "explain" command. > Any thoughts on adding this, particularly around how we could/should > expose it in the API? I think we should make it available for users and make it really easy to access it. I'm not sure about the API, though. Since it's really cheap to create we could always generate dot output, store it inside the Configuration instance and provide a static utility class to access it? A while ago we discussed moving debugging/log4j manipulation logic out of the MRPipeline, perhaps we can use a single CrunchDebug utilty for both. Regards, Matthias