htrace-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin McCabe <cmcc...@apache.org>
Subject Re: [DISCUSS] Attic podling Apache HTrace?
Date Thu, 17 Aug 2017 21:21:28 GMT
On Thu, Aug 17, 2017, at 12:25, Andrew Purtell wrote:
> What about OpenTracing (http://opentracing.io/)? Is this the successor
> project to ZipKin? In particular grpc-opentracing (
> https://github.com/grpc-ecosystem/grpc-opentracing) seems to finally
> fulfill in open source the tracing architecture described in the Dapper
> paper.

OpenTracing is essentially an API which sits on top of another tracing
system.

So you can instrument your code with the OpenTracing library, and then
have that send the trace spans to OpenZipkin.

Here are some thoughts here about this topic from a Zipkin developer: 
https://gist.github.com/wu-sheng/b8d51dda09d3ce6742630d1484fd55c7#what-is-the-relationship-between-zipkin-and-opentracing
.  Probably Adrian Cole can chime in here as well.

In general the OpenTracing folks have been friendly and respectful.  (If
any of them are reading this, I apologize for not following some of the
discussions on gitter more thoroughly-- my time is just split so many
ways right now!)

> 
> If one takes a step back and looks at all of the hand rolled RPC stacks
> in
> the Hadoop ecosystem it's a mess. It is a heavier lift but getting
> everyone
> migrated to a single RPC stack - gRPC - would provide the unified tracing
> layer envisioned by HTrace. The tracing integration is then done exactly
> in
> one place. In contrast HTrace requires all of the components to sprinkle
> spans throughout the application code.
> 

That's not the issue.  We already have HTrace integration with Hadoop
RPC, such that a Hadoop RPC creates a span.  Integration with any RPC
system is actually very straightforward-- you just add two fields to the
base RPC request definition, and patch the RPC system to use them.

Just instrumenting RPC is not sufficient.  You need programmers to add
explicit span annotations to your code so that you can have useful
information beyond what a program like wireshark would find.  Things
like what disk is a request hitting, what HBase PUT is an HDFS write
associated with, and so forth.

Also, this is getting off topic, but there is a new RPC system every
year or two.  Java-RMI, CORBA, Thrift, Akka, SOAP, KRPC, Finagle, GRPC,
REST/JSON, etc.  They all have advantages and disadvantages.  For
example, GRPC depends on protobuf-- and Hadoop has a lot of deployment
and performance problems with the protobuf-java library.  I wish GPRC
luck, but I think it's good for people to experiment with different
libraries.  It doesn't make sense to try to force everyone to use one
thing, even if we could.

> The Hadoop ecosystem is always partially at odds with itself, if for no
> other reason than there is no shared vision among the projects. There are
> no coordinated releases. There isn't even agreement on which version of
> shared dependencies to use (hence the recurring pain in various places
> with
> downstream version changes of protobuf, guava, jackson, etc. etc).
> Therefore HTrace is severely constrained on what API changes can be made.
> Unfortunately the different major versions of HTrace do not interoperate
> at
> all. And are not even source compatible. While is not unreasonable at all
> for a project in incubation, when combined with the inability of the
> Hadoop
> ecosystem to coordinate releases as a cross-cutting dependency ships a
> new
> version, this has reduced the utility of HTrace to effectively nil for
> the
> average user. I am sorry to say that. Only a commercial Hadoop vendor or
> power user can be expected to patch and build a stack that actually
> works.

One correction: The different major versions of HTrace are indeed source
code compatible.  You can build an application that can use both HTrace
3 and HTrace 4.  This was absolutely essential for us because of the
version skew issues you mention.

> On Thu, Aug 17, 2017 at 11:04 AM, lewis john mcgibbney <lewismc@apache.org> wrote:
> 
> > Hi Mike,
> > I think this is a fair question. We've probably all been associated with
> > projects which just don't really make it. It would appear that HTrace is
> > one of them. This is not to say that there is nothing going on with the
> > tracing effort generally (as there is) but it looks like HTrace as a
> > project may be headed to the Attic.
> > I suppose the response to this thread will determine what happens...

Thanks, Lewis.

I think maybe we should try to identify the top tracing priorities for
HBase and HDFS and see how HTrace / OpenTracing / OpenZipkin could fit
into those.  Just start from a nice crisp set of requirements, like
Stack suggested, and think about how we could make those a reality.  If
we can advance the state of tracing in hadoop, that will be a good thing
for our users, even if htrace goes to the attic.  I've been mostly
working on Apache Kafka these days but I could drop by to brainstorm.

best,
Colin


> > Lewis
> > ​​
> >
> >
> > On Wed, Aug 16, 2017 at 10:01 AM, <
> > dev-digest-help@htrace.incubator.apache.org> wrote:
> >
> > >
> > > From: Mike Drob <mdrob@apache.org>
> > > To: dev@htrace.incubator.apache.org
> > > Cc:
> > > Bcc:
> > > Date: Wed, 16 Aug 2017 12:00:49 -0500
> > > Subject: [DISCUSS] Attic podling Apache HTrace?
> > > Hi folks,
> > >
> > > Want to bring up a potentially uncofortable topic for some. Is it time to
> > > retire/attic the project?
> > >
> > > We've seen a minimal amount of activity in the past year. The last
> > release
> > > had two bug fixes, and had been pending for several months before
> > somebody
> > > reminded me to push the artifacts to subversion from the staging
> > directory.
> > >
> > > I'd love to see a renewed set of activity here, but I don't think there
> > is
> > > a ton of interest going on.
> > >
> > > HBase is still on version 3. So is Accumulo, I think. Hadoop is on 4.1,
> > > which is a good sign, but I haven't heard much from them recently. I
> > > definitely do no think we are at the point where a lack of releases and
> > > activity is a sign of super advanced maturity and stability.
> > >
> > > Your thoughts?
> > >
> > > Mike
> > >
> > >
> >
> >
> > --
> > http://home.apache.org/~lewismc/
> > @hectorMcSpector
> > http://www.linkedin.com/in/lmcgibbney
> >
> 
> 
> 
> -- 
> Best regards,
> Andrew
> 
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>    - A23, Crosstalk

Mime
View raw message