crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Friedrich <m...@mafr.de>
Subject Re: About status web page
Date Sun, 17 Mar 2013 08:03:48 GMT
Hi,

I'm still not convinced that running a web service from a batch job is
a good technical fit because it is transient in nature. For small jobs
you only have a second or two to hit reload in your browser.

How about leaving the server out of crunch core and just add
functionality for a Pipeline to post its Configuration to an external
web service? In debug mode, the Pipeline could do a HTTP PUT to a
well-known address (http://localhost:10080/jobs/, but that could be
configurable). When debugging, users would start the web service
separately if they need it.

The advantage is that crunch core stays clean and the web service
sees more than just one Pipeline, so it can display a history of
executed Pipelines.

BTW, what's the license of vis.js?

Regards,
  Matthias

On Sunday, 2013-03-17, Chao Shi wrote:
> My previous post seems not to be delivered successfully. Try to gzip the
> patch. The patch is large since it contains jquery and viz.js.
> 
> On Fri, Mar 15, 2013 at 11:02 PM, Chao Shi <stepinto@live.com> wrote:
> 
> > Hey guys,
> >
> > I have a very simple prototype for this. It uses DotfileWriter to generate
> > the dot file and renders it with viz.js.
> >
> > There are lots things that could be improved:
> > - show completed/running jobs in different colors, perhaps as well as job
> > progress in percentage
> > - interactive things on UI, e.g. click on a job will navigate to JT page,
> > auto refresh
> > - configurable port
> > - .. and more
> >
> > I'd like to hear what do you think of the prototype before continue. A
> > quick way to demo it is to patch it and run some integration tests. During
> > the integration tests, you can navigate to http://localhost:10080.
> >
> > On Wed, Feb 27, 2013 at 3:30 PM, Matthias Friedrich <matt@mafr.de> wrote:
> >
> >> On Wednesday, 2013-02-27, Chao Shi wrote:
> >> > I'm developing a complex pipeline (30+ MRs plus lots of joins). I have
a
> >> > hard time to understand which part of the pipeline spends most running
> >> time
> >> > and how much intermediate output does it produce. Crunch's optimization
> >> > work is great, but it makes the execution plan difficult to be
> >> understood.
> >> > Each time I modified the pipeline, I have to dump the dot file and run
> >> > graphviz to generate a new picture and examine if there's anything
> >> wrong.
> >> >
> >> > About security, I'm not familiar with how Hadoop does it. I will try to
> >> > reuse hadoop's HttpServer (does it have something to do with security?).
> >> > The bottom line is to make this feature disabled by default, and let
> >> users
> >> > enable it at their own risk.
> >>
> >> OK, sounds good.
> >>
> >> > If this feature is enabled, the user can choose to use unused port or
> >> > specified port. I haven't got an idea that how the user know the
> >> randomly
> >> > picked port (via log?) . I will be working on a prototype version first,
> >> > and see if the status page is generally useful.
> >>
> >> Yeah, logging the URL would probably be the only thing that works. Not
> >> counting fancy stuff like MDNS ;-)
> >>
> >> In my opinion, we should try to get this done with the dependencies that
> >> we already get through Hadoop. Each additional library we add to Crunch
> >> will cause interoperability problems for someone.
> >>
> >> Regards,
> >>   Matthias
> >>
> >>
> >



Mime
View raw message