hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: [DISCUSS] Tracing in the Hadoop ecosystem
Date Tue, 21 Aug 2018 18:18:28 GMT
On Tue, Aug 21, 2018 at 10:09 AM Andrew Purtell <apurtell@apache.org> wrote:

> What if someone built a HTrace facade for Zipkin / Brave?


I like the idea but taking a look, HTrace does static dispatch. I was
thinking that precludes our being able to do a facade. I would love to hear
otherwise.
Thanks,
S


> Hadoop, HBase,
> Phoenix, and other HTrace API users would still need to move away from
> embedding HTrace instrumentation points to whatever is the normal API of
> the accepted replacement, but such a facade would give you a drop in
> replacement requiring no code changes to currently shipping code lines, and
> some time to do a hopefully coordinated replacement involving all upstreams
> and downstreams. Just a thought. Zipkin / Brave has widespread adoption of
> that option and the impending incubation here at the ASF will make it quite
> attractive, I think.
>
>
> On Tue, Aug 21, 2018 at 7:50 AM Stack <stack@duboce.net> wrote:
>
> > On Tue, Aug 21, 2018 at 3:44 AM Tsuyoshi Ozawa <ozawa@apache.org> wrote:
> >
> > > Thanks for starting discussion, Stack.
> > >
> > > The ZipKin seems to be coming to the Apache Incubator. As Andrew
> > > Purtell said on HADOOP-15566, it would be good option since there is
> > > no problem about licenses.
> > > https://wiki.apache.org/incubator/ZipkinProposal
> > >
> > >
> > Yes. This is nice to see.
> >
> >
> >
> > > Stack, do you have any knowledge about differences between Zipkin and
> > > HTrace? Might measurable performance overhead be observed still in
> > > Zipkin?
> > >
> > >
> > I've not measured to see if disabled trace points are friction-free.
> > Perhaps someone else has?
> >
> >
> >
> > > To decrease the overhead, we need to do additional work like ftrace,
> > > well known dtrace implementation in Linux kernel. If I understand
> > > correctly, ftrace replace its function calls with NOP operations of
> > > CPU instruction when it is disabled. This ensures the lower overhead
> > > by the tracer. By replacing the function calls for tracing to JVM's
> > > NOP operation, can we achieve the minimum overhead?
> > >
> > >
> > That'd be ideal. Makes sense inside the kernel. But up in our sloppy java
> > context, we should be able to get away with something less exotic.
> >
> > Thanks Tsuyoshi,
> > S
> >
> >
> >
> >
> > > Regards
> > > - Tsuyoshi
> > > On Tue, Jul 31, 2018 at 9:59 AM Eric Yang <eyang@hortonworks.com>
> wrote:
> > > >
> > > > Most of code coverage tools can instrument java classes without make
> > any
> > > > source code changes, but tracing distributed system is more involved
> > > because
> > > > code execution via network interactions are not easy to match up.
> > > > All interactions between sender and receiver have some form of
> session
> > id
> > > > or sequence id.  Hadoop had some logic to assist the stitching of
> > > distributed
> > > > interactions together in clienttrace log.  This information seems to
> > > have been
> > > > lost in the last 5-6 years of Hadoop evolutions.  Htrace is invented
> to
> > > fill the void
> > > > left behind by clienttrace as a programmable API to send out useful
> > > tracing data for
> > > > downstream analytical program to visualize the interaction.
> > > >
> > > > Large companies have common practice to enforce logging the session
> id,
> > > and
> > > > write homebrew tools to stitch together debugging logic for a
> specific
> > > software.
> > > > There are also growing set of tools from Splunk or similar companies
> to
> > > write
> > > > analytical tools to stitch the views together.  Hadoop does not seem
> to
> > > be on
> > > > top of the list for those company to implement the tracing because
> > Hadoop
> > > > networking layer is complex and changed more frequently than desired.
> > > >
> > > > If we go back to logging approach, instead of API approach, it will
> > help
> > > > someone to write the analytical program someday.  The danger of
> logging
> > > > approach is that It is boring to write LOG.debug() everywhere, and we
> > > > often forgot about it, and log entries are removed.
> > > >
> > > > API approach can work, if real time interactive tracing can be done.
> > > > However, this is hard to realize in Hadoop because massive amount of
> > > > parallel data is difficult to aggregate at real time without hitting
> > > timeout.
> > > > It has a higher chance to require changes to network protocol that
> > might
> > > cause
> > > > more headache than it's worth.  I am in favor of removing Htrace
> > support
> > > > and redo distributed tracing using logging approach.
> > > >
> > > > Regards,
> > > > Eric
> > > >
> > > > ´╗┐On 7/30/18, 3:06 PM, "Stack" <stack@duboce.net> wrote:
> > > >
> > > >     There is a healthy discussion going on over in HADOOP-15566 on
> > > tracing
> > > >     in the Hadoop ecosystem. It would sit better on a mailing list
> than
> > > in
> > > >     comments up on JIRA so here's an attempt at porting the chat
> here.
> > > >
> > > >     Background/Context: Bits of Hadoop and HBase had Apache HTrace
> > trace
> > > >     points added. HTrace was formerly "incubating" at Apache but has
> > > since
> > > >     been retired, moved to Apache Attic. HTrace and the efforts at
> > > >     instrumenting Hadoop wilted for want of attention/resourcing. Our
> > > Todd
> > > >     Lipcon noticed that the HTrace instrumentation can add friction
> on
> > > >     some code paths so can actually be harmful even when disabled.
> The
> > > >     natural follow-on is that we should rip out tracings of a "dead"
> > > >     project. This then beggars the question, should something replace
> > it
> > > >     and if so what? This is where HADOOP-15566 is at currently.
> > > >
> > > >     HTrace took two or three runs, led by various Heros, at building
> a
> > > >     trace lib for Hadoop (first). It was trying to build the trace
> > lib, a
> > > >     store, and a visualizer. Always, it had a mechanism for dumping
> the
> > > >     traces out to external systems for storage and viewing (e.g.
> > Zipkin).
> > > >     HTrace started when there was little else but the, you guessed
> it,
> > > >     Google paper that described the Dapper system they had
> internally.
> > > >     Since then, the world of tracing has come on in leaps and bounds
> > with
> > > >     healthy alternatives, communities, and even commercialization.
> > > >
> > > >     If interested, take a read over HADOOP-15566. Will try and
> > encourage
> > > >     participants to move the chat here.
> > > >
> > > >     Thanks,
> > > >     St.Ack
> > > >
> > > >
> >  ---------------------------------------------------------------------
> > > >     To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> > > >     For additional commands, e-mail:
> common-dev-help@hadoop.apache.org
> > > >
> > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> > > > For additional commands, e-mail: common-dev-help@hadoop.apache.org
> > >
> >
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>    - A23, Crosstalk
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message