crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: About status web page
Date Wed, 27 Feb 2013 04:29:09 GMT
Hey Chao,

Does the asynchronous pipeline execution work in
https://issues.apache.org/jira/browse/CRUNCH-156 help with this? Right now,
it returns an ListenableFuture<PipelineResult> from runAsync, but we could
add support for returning the graphviz plan as well, so that you could fire
up a server to visualize the file while the job was running.

J


On Tue, Feb 26, 2013 at 8:03 PM, Chao Shi <stepinto@live.com> wrote:

> Yes, it is for debugging and monitoring.
>
> I'm developing a complex pipeline (30+ MRs plus lots of joins). I have a
> hard time to understand which part of the pipeline spends most running time
> and how much intermediate output does it produce. Crunch's optimization
> work is great, but it makes the execution plan difficult to be understood.
> Each time I modified the pipeline, I have to dump the dot file and run
> graphviz to generate a new picture and examine if there's anything wrong.
>
> About security, I'm not familiar with how Hadoop does it. I will try to
> reuse hadoop's HttpServer (does it have something to do with security?).
> The bottom line is to make this feature disabled by default, and let users
> enable it at their own risk.
>
> If this feature is enabled, the user can choose to use unused port or
> specified port. I haven't got an idea that how the user know the randomly
> picked port (via log?) . I will be working on a prototype version first,
> and see if the status page is generally useful.
>
> On Wed, Feb 27, 2013 at 2:30 AM, Matthias Friedrich <matt@mafr.de> wrote:
>
> > Hi Chao,
> >
> > sounds interesting - just a couple of things that come to mind:
> >
> > I this intended as debugging aid or for operational monitoring?
> >
> > A Crunch job is a temporary thing, to me this doesn't sound like a
> > good match for a web service because it disappears after a (possibly
> > short) time. Also, when multiple jobs are executed concurrently from
> > the same machine, you can't work with a well-known port, you'd have to
> > pick an unused port for each job.
> >
> > It also looks to me like this has security implications? Right now,
> > Crunch is just a client library and we're part of Hadoop's security
> > framework. A web service we might have to secure in some way.
> >
> > Regards,
> >   Matthias
> >
> > On Tuesday, 2013-02-26, Chao Shi wrote:
> > > Hi Crunch Devs,
> > >
> > > I'm interested in adding a web status page to crunch. I'm working on a
> > > prototype first, which simply runs a jetty server and renders the dot
> > file
> > > produced by DotFileWriter at browser. The dot rendering work is done by
> > > viz.js <https://github.com/mdaines/viz.js>. It can successfully render
> > the
> > > plan into SVG.
> > >
> > > I think there are 2 issues I hit with viz.js:
> > >
> > > 1. The license of viz.js is unclear. It is compiled from GraphViz
> source
> > > code with emscripten. GraphViz is Eclipse Public License 1.0.
> > >
> > > 2. viz.js is big and slow. It is a 1.4MB compressed JS. It takes 1 or 2
> > > seconds on my laptop to render my pipeline (30+ MRs). I think it good
> to
> > > have the graph refresh frequently and show the running status of the
> > > pipeline (i.e. whether MRs are done or not). Thus the rendering time
> > would
> > > be too slow.
> > >
> > > Another approach is to call graphviz command at server side, if viz.js
> is
> > > not possible. I can't find any pure Java implementation of graphviz.
> > >
> > > Looking forward to your advices.
> > >
> > > Thanks,
> > > Chao
> >
> >
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message