htrace-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin McCabe <cmcc...@apache.org>
Subject Re: [DISCUSS] Attic podling Apache HTrace?
Date Thu, 17 Aug 2017 21:57:41 GMT
On Thu, Aug 17, 2017, at 14:40, Andrew Purtell wrote:
> > That's not the issue.  We already have HTrace integration with Hadoop
> RPC, such that a Hadoop RPC creates a span.
> 
> This is an issue. I'm glad Hadoop RPC is covered, but nobody but Hadoop
> uses it. Likewise, HBase RP These are not general purpose RPC stacks by
> any stretch. There are some of those around. Some have tracing built in.
> They take some of the oxygen out of the room. I think that is a fair
> point when thinking about the viability of a podling that sees little activity
> as it is.

Yeah-- maybe we should integrate HTrace into HBase RPC as well.

I don't think RPC-specific trace systems have been a strong competitors.
 Since the RPC landscape is so fragmented, those systems tend to not get
used by many people.  Our strongest open source competitors, OpenTracing
and OpenZipkin, support multiple RPC systems.  (Zipkin originally was
specific to Finagle, but that is no longer true.)

> I didn't come here to suggest HTrace go away, though. I came to raise a
> few points on why adoption and use of HTrace has very likely suffered from
> usability problems. These problems are still not completely resolved.
> Stack describes HTrace integration with HBase as broken. My experience has been
> I have to patch POMs, and patch HDFS, HBase, and Phoenix code, to get
> anything that works at all. I also sought to tie some of those problems
> to ecosystem issues because I know it is hard. For what it's worth, thanks.

I think you make some very good points about the difficulty of doing
cross-project coordination.  One thing that really held back HTrace 4.0
was that it was originally scheduled to be part of Hadoop 2.8-- and the
Hadoop 2.8 release was delayed for a really, really long time, to the
point when it almost became a punchline.  So people had to use vendor
releases to get HTrace 4, because those were the only releases with new
Hadoop code.

Colin


> 
> 
> 
> On Thu, Aug 17, 2017 at 2:21 PM, Colin McCabe <cmccabe@apache.org> wrote:
> 
> > On Thu, Aug 17, 2017, at 12:25, Andrew Purtell wrote:
> > > What about OpenTracing (http://opentracing.io/)? Is this the successor
> > > project to ZipKin? In particular grpc-opentracing (
> > > https://github.com/grpc-ecosystem/grpc-opentracing) seems to finally
> > > fulfill in open source the tracing architecture described in the Dapper
> > > paper.
> >
> > OpenTracing is essentially an API which sits on top of another tracing
> > system.
> >
> > So you can instrument your code with the OpenTracing library, and then
> > have that send the trace spans to OpenZipkin.
> >
> > Here are some thoughts here about this topic from a Zipkin developer:
> > https://gist.github.com/wu-sheng/b8d51dda09d3ce6742630d1484fd55
> > c7#what-is-the-relationship-between-zipkin-and-opentracing
> > .  Probably Adrian Cole can chime in here as well.
> >
> > In general the OpenTracing folks have been friendly and respectful.  (If
> > any of them are reading this, I apologize for not following some of the
> > discussions on gitter more thoroughly-- my time is just split so many
> > ways right now!)
> >
> > >
> > > If one takes a step back and looks at all of the hand rolled RPC stacks
> > > in
> > > the Hadoop ecosystem it's a mess. It is a heavier lift but getting
> > > everyone
> > > migrated to a single RPC stack - gRPC - would provide the unified tracing
> > > layer envisioned by HTrace. The tracing integration is then done exactly
> > > in
> > > one place. In contrast HTrace requires all of the components to sprinkle
> > > spans throughout the application code.
> > >
> >
> > That's not the issue.  We already have HTrace integration with Hadoop
> > RPC, such that a Hadoop RPC creates a span.  Integration with any RPC
> > system is actually very straightforward-- you just add two fields to the
> > base RPC request definition, and patch the RPC system to use them.
> >
> > Just instrumenting RPC is not sufficient.  You need programmers to add
> > explicit span annotations to your code so that you can have useful
> > information beyond what a program like wireshark would find.  Things
> > like what disk is a request hitting, what HBase PUT is an HDFS write
> > associated with, and so forth.
> >
> > Also, this is getting off topic, but there is a new RPC system every
> > year or two.  Java-RMI, CORBA, Thrift, Akka, SOAP, KRPC, Finagle, GRPC,
> > REST/JSON, etc.  They all have advantages and disadvantages.  For
> > example, GRPC depends on protobuf-- and Hadoop has a lot of deployment
> > and performance problems with the protobuf-java library.  I wish GPRC
> > luck, but I think it's good for people to experiment with different
> > libraries.  It doesn't make sense to try to force everyone to use one
> > thing, even if we could.
> >
> > > The Hadoop ecosystem is always partially at odds with itself, if for no
> > > other reason than there is no shared vision among the projects. There are
> > > no coordinated releases. There isn't even agreement on which version of
> > > shared dependencies to use (hence the recurring pain in various places
> > > with
> > > downstream version changes of protobuf, guava, jackson, etc. etc).
> > > Therefore HTrace is severely constrained on what API changes can be made.
> > > Unfortunately the different major versions of HTrace do not interoperate
> > > at
> > > all. And are not even source compatible. While is not unreasonable at all
> > > for a project in incubation, when combined with the inability of the
> > > Hadoop
> > > ecosystem to coordinate releases as a cross-cutting dependency ships a
> > > new
> > > version, this has reduced the utility of HTrace to effectively nil for
> > > the
> > > average user. I am sorry to say that. Only a commercial Hadoop vendor or
> > > power user can be expected to patch and build a stack that actually
> > > works.
> >
> > One correction: The different major versions of HTrace are indeed source
> > code compatible.  You can build an application that can use both HTrace
> > 3 and HTrace 4.  This was absolutely essential for us because of the
> > version skew issues you mention.
> >
> > > On Thu, Aug 17, 2017 at 11:04 AM, lewis john mcgibbney <
> > lewismc@apache.org> wrote:
> > >
> > > > Hi Mike,
> > > > I think this is a fair question. We've probably all been associated
> > with
> > > > projects which just don't really make it. It would appear that HTrace
> > is
> > > > one of them. This is not to say that there is nothing going on with the
> > > > tracing effort generally (as there is) but it looks like HTrace as a
> > > > project may be headed to the Attic.
> > > > I suppose the response to this thread will determine what happens...
> >
> > Thanks, Lewis.
> >
> > I think maybe we should try to identify the top tracing priorities for
> > HBase and HDFS and see how HTrace / OpenTracing / OpenZipkin could fit
> > into those.  Just start from a nice crisp set of requirements, like
> > Stack suggested, and think about how we could make those a reality.  If
> > we can advance the state of tracing in hadoop, that will be a good thing
> > for our users, even if htrace goes to the attic.  I've been mostly
> > working on Apache Kafka these days but I could drop by to brainstorm.
> >
> > best,
> > Colin
> >
> >
> > > > Lewis
> > > > ​​
> > > >
> > > >
> > > > On Wed, Aug 16, 2017 at 10:01 AM, <
> > > > dev-digest-help@htrace.incubator.apache.org> wrote:
> > > >
> > > > >
> > > > > From: Mike Drob <mdrob@apache.org>
> > > > > To: dev@htrace.incubator.apache.org
> > > > > Cc:
> > > > > Bcc:
> > > > > Date: Wed, 16 Aug 2017 12:00:49 -0500
> > > > > Subject: [DISCUSS] Attic podling Apache HTrace?
> > > > > Hi folks,
> > > > >
> > > > > Want to bring up a potentially uncofortable topic for some. Is it
> > time to
> > > > > retire/attic the project?
> > > > >
> > > > > We've seen a minimal amount of activity in the past year. The last
> > > > release
> > > > > had two bug fixes, and had been pending for several months before
> > > > somebody
> > > > > reminded me to push the artifacts to subversion from the staging
> > > > directory.
> > > > >
> > > > > I'd love to see a renewed set of activity here, but I don't think
> > there
> > > > is
> > > > > a ton of interest going on.
> > > > >
> > > > > HBase is still on version 3. So is Accumulo, I think. Hadoop is on
> > 4.1,
> > > > > which is a good sign, but I haven't heard much from them recently.
I
> > > > > definitely do no think we are at the point where a lack of releases
> > and
> > > > > activity is a sign of super advanced maturity and stability.
> > > > >
> > > > > Your thoughts?
> > > > >
> > > > > Mike
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > http://home.apache.org/~lewismc/
> > > > @hectorMcSpector
> > > > http://www.linkedin.com/in/lmcgibbney
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrew
> > >
> > > Words like orphans lost among the crosstalk, meaning torn from truth's
> > > decrepit hands
> > >    - A23, Crosstalk
> >
> 
> 
> 
> -- 
> Best regards,
> Andrew
> 
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>    - A23, Crosstalk

Mime
View raw message