htrace-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dylan Hutchison <dhutc...@cs.washington.edu>
Subject Re: Trace through MapReduce?
Date Sun, 27 Nov 2016 00:52:30 GMT
Thanks Colin.  I managed to get this working by

   1. Starting a trace in the MapReduce driver.
   2. Recording the trace and parent span IDs in the driver and passing
   them as parameters to the Input/Output formats of MR.
   3. Initializing the SpanReceivers in the Mappers and Reducers, or doing
   a no-op if they are already initialized.  (i.e., make the init idempotent)
   4. Setting the span of the Mappers and Reducers to start a new child
   span of the driver's span/trace ID.
   5. Closing the spans at the Mappers/Reducers when they finish.
   6. Closing the root span at the driver once everything finishes.  (In my
   application the driver waits for the MapReduce job to finish.)

This gave me some insight into when the Mappers and Reducers
started/finished and what they were doing in the middle.  My next step is
to verify my hypotheses on a cluster.

If I were to contribute this, I would design Input & Ouput Formats that
wrap other Input & Output Formats, except that they setup and start the
tracing when they are created.  Unfortunately since I'm on a deadline and
in the middle of performance debugging, I hacked this in hardcoded.  Not
sure if I'll make the time to package this up.

Regards, Dylan

On Sat, Nov 26, 2016 at 11:21 AM, Colin McCabe <cmccabe@apache.org> wrote:

> P.S. If you're interested in contributing, adding this MapReduce ID to
> spans would be a cool project.  Also, converting accumulo to the latest
> version of HTrace (4.0) would be great.
>
> Colin
>
>
> On Sat, Nov 26, 2016, at 11:20, Colin McCabe wrote:
> > Hi Dylan,
> >
> > Thanks for trying out HTrace!  We haven't added HTrace support to
> > MapReduce yet.  Since MapReduce involves very long-running jobs, there
> > is some discussion about the best way to add HTrace support to it. It
> > doesn't really fit into the "one trace per request" model that HDFS
> > uses.  One promising proposal is to add a tag to all spans that are
> > created during a given mapreduce job, that contains an ID which can be
> > traced back to the MR job.
> >
> > best,
> > Colin
> >
> >
> > On Sat, Nov 26, 2016, at 05:17, Dylan Hutchison wrote:
> > > Hi folks,
> > >
> > > I am using HTrace 3 with Accumulo.  I would like to trace through a
> > > MapReduce program that uses Accumulo Input/Output formats.  Has anyone
> > > done
> > > this?  I am using Hadoop 2.7.2, HTrace 3.1.0, Accumulo 1.8.0.
> > >
> > > I confirm HTrace 3 is working with client java programs that scan
> > > Accumulo.
> > >
> > >
> > > I am not sure if Hadoop tracing is working. I added the ZooTraceClient
> > > configuration to Hadoop and added the relevant Accumulo jars to
> Hadoop's
> > > classpath, but I don't know if it worked.  (I see a new trace entry
> > > called
> > > ClientNamenodeProtocol that I never saw before, but it's not proof that
> > > Hadoop tracing is working.)
> > >
> > > I don't think the trace is being wrapped around the MapReduce
> mechanisms
> > > that exec Mappers and Reducers over Yarn.
> > >
> > > Maybe I can make it work by detaching the trace?  Would HTrace work if
> I
> > > detach a trace from one process, record the trace ID, send the trace ID
> > > to
> > > the mappers and reducers, and then re-attach at the mapper and reducer
> > > processes?
> > >
> > > Cheers, Dylan
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message