htrace-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Cole <adrian.f.c...@gmail.com>
Subject Re: [DISCUSS] Attic podling Apache HTrace?
Date Fri, 18 Aug 2017 00:56:27 GMT
Just speaking on the OpenTracing vs whatever part. What colin mentioned is
correct. It is a library api defined for tracing and not an implementation
of a tracer or a backend.

That said, there are certain backends that are preferred, notably lightstep
and jaeger (by uber). This is because folks here did most of the defining
even if others do participate. This affects a view of what tracing is
inside OT. Notably, both have a view that logging is tracing (ex it is ok
and sometimes encouraged to push system logs into a span). These opinions
are sometimes encouraged through presentations etc which might make it a
better or worse fit as an Htrace replacement. For example, most in zipkin
are not keen on escalating it to a logging system as it was not designed
for this, and similarly to here, we couldnt afford to accept more
responsibility like that.

HTrace is almost never mentioned in OpenTracing discussions except when I
do. That by itself has been troubling to me as if it were meant to be
neutral it should have been mentioned constantly and impacting design.
Anyway..

The "actual dapper" team which is called census have spun up and are moving
fast. This has no backend yet but most can or soon will report to zipkin.
https://github.com/census-instrumentation

Most important to all of this imho is that the jury is out on whether
instrumentation libraries are indeed shared. For example, eventhough
amazon, microsoft, dynatrace app dynamics, new relic, facebook etc all know
about OpenTracing, it isnt what they are using as a core api. In some cases
it is because they have an event layer instead, and in others it is that
they prefer a data type approach as opposed to a dictated library
interface. Some in OpenTracing have struggled to influence the project
around points they have felt important, notably propagation, and wrote
their own bespoke layers or wrappers to handle it properly. Some of this is
fixable in OT, but imho the change dynamics, culture and leadership have
not changed since inception.

Many zipkin users use OpenTracing libraries, probably due to the high level
of staff and marketing they have behind the effort. For example, Red Hat
staff write a lot of things faster than volunteers can. That said, many
zipkin users prefer existing, especially well attended, libraries by the
project or ecosystem. Looking at github, native adoption is far more than
OT. In many cases, users roll their own still. This is not the same as lack
of a complete choice.. developers can, do and continue to write their own
code if given a spec on how to do it. This is also true in OpenTracing,
except there you need to both know the abstraction and the backend to write
custom.

OpenTracing is in CNCF now, as is their preferred system Jaeger. As far as
I know you wouldnt also be in ASF, but I dont know if that matters. Census
is likely to be CNCF because google (but I have no insight, just a guess).
Zipkin is on hold wrt foundation, we didnt have enough ummph to get to one
last year, so jury is still out.

Personally, I think Census have a lot of things right, ex separation of
concerns between logging metrics tracing and propagation. That said I think
all could learn from htrace or collaborate regardless of this outcome.

On 18 Aug 2017 05:57, "Colin McCabe" <cmccabe@apache.org> wrote:

On Thu, Aug 17, 2017, at 14:40, Andrew Purtell wrote:
> > That's not the issue.  We already have HTrace integration with Hadoop
> RPC, such that a Hadoop RPC creates a span.
>
> This is an issue. I'm glad Hadoop RPC is covered, but nobody but Hadoop
> uses it. Likewise, HBase RP These are not general purpose RPC stacks by
> any stretch. There are some of those around. Some have tracing built in.
> They take some of the oxygen out of the room. I think that is a fair
> point when thinking about the viability of a podling that sees little
activity
> as it is.

Yeah-- maybe we should integrate HTrace into HBase RPC as well.

I don't think RPC-specific trace systems have been a strong competitors.
 Since the RPC landscape is so fragmented, those systems tend to not get
used by many people.  Our strongest open source competitors, OpenTracing
and OpenZipkin, support multiple RPC systems.  (Zipkin originally was
specific to Finagle, but that is no longer true.)

> I didn't come here to suggest HTrace go away, though. I came to raise a
> few points on why adoption and use of HTrace has very likely suffered from
> usability problems. These problems are still not completely resolved.
> Stack describes HTrace integration with HBase as broken. My experience
has been
> I have to patch POMs, and patch HDFS, HBase, and Phoenix code, to get
> anything that works at all. I also sought to tie some of those problems
> to ecosystem issues because I know it is hard. For what it's worth,
thanks.

I think you make some very good points about the difficulty of doing
cross-project coordination.  One thing that really held back HTrace 4.0
was that it was originally scheduled to be part of Hadoop 2.8-- and the
Hadoop 2.8 release was delayed for a really, really long time, to the
point when it almost became a punchline.  So people had to use vendor
releases to get HTrace 4, because those were the only releases with new
Hadoop code.

Colin


>
>
>
> On Thu, Aug 17, 2017 at 2:21 PM, Colin McCabe <cmccabe@apache.org> wrote:
>
> > On Thu, Aug 17, 2017, at 12:25, Andrew Purtell wrote:
> > > What about OpenTracing (http://opentracing.io/)? Is this the successor
> > > project to ZipKin? In particular grpc-opentracing (
> > > https://github.com/grpc-ecosystem/grpc-opentracing) seems to finally
> > > fulfill in open source the tracing architecture described in the
Dapper
> > > paper.
> >
> > OpenTracing is essentially an API which sits on top of another tracing
> > system.
> >
> > So you can instrument your code with the OpenTracing library, and then
> > have that send the trace spans to OpenZipkin.
> >
> > Here are some thoughts here about this topic from a Zipkin developer:
> > https://gist.github.com/wu-sheng/b8d51dda09d3ce6742630d1484fd55
> > c7#what-is-the-relationship-between-zipkin-and-opentracing
> > .  Probably Adrian Cole can chime in here as well.
> >
> > In general the OpenTracing folks have been friendly and respectful.  (If
> > any of them are reading this, I apologize for not following some of the
> > discussions on gitter more thoroughly-- my time is just split so many
> > ways right now!)
> >
> > >
> > > If one takes a step back and looks at all of the hand rolled RPC
stacks
> > > in
> > > the Hadoop ecosystem it's a mess. It is a heavier lift but getting
> > > everyone
> > > migrated to a single RPC stack - gRPC - would provide the unified
tracing
> > > layer envisioned by HTrace. The tracing integration is then done
exactly
> > > in
> > > one place. In contrast HTrace requires all of the components to
sprinkle
> > > spans throughout the application code.
> > >
> >
> > That's not the issue.  We already have HTrace integration with Hadoop
> > RPC, such that a Hadoop RPC creates a span.  Integration with any RPC
> > system is actually very straightforward-- you just add two fields to the
> > base RPC request definition, and patch the RPC system to use them.
> >
> > Just instrumenting RPC is not sufficient.  You need programmers to add
> > explicit span annotations to your code so that you can have useful
> > information beyond what a program like wireshark would find.  Things
> > like what disk is a request hitting, what HBase PUT is an HDFS write
> > associated with, and so forth.
> >
> > Also, this is getting off topic, but there is a new RPC system every
> > year or two.  Java-RMI, CORBA, Thrift, Akka, SOAP, KRPC, Finagle, GRPC,
> > REST/JSON, etc.  They all have advantages and disadvantages.  For
> > example, GRPC depends on protobuf-- and Hadoop has a lot of deployment
> > and performance problems with the protobuf-java library.  I wish GPRC
> > luck, but I think it's good for people to experiment with different
> > libraries.  It doesn't make sense to try to force everyone to use one
> > thing, even if we could.
> >
> > > The Hadoop ecosystem is always partially at odds with itself, if for
no
> > > other reason than there is no shared vision among the projects. There
are
> > > no coordinated releases. There isn't even agreement on which version
of
> > > shared dependencies to use (hence the recurring pain in various places
> > > with
> > > downstream version changes of protobuf, guava, jackson, etc. etc).
> > > Therefore HTrace is severely constrained on what API changes can be
made.
> > > Unfortunately the different major versions of HTrace do not
interoperate
> > > at
> > > all. And are not even source compatible. While is not unreasonable at
all
> > > for a project in incubation, when combined with the inability of the
> > > Hadoop
> > > ecosystem to coordinate releases as a cross-cutting dependency ships a
> > > new
> > > version, this has reduced the utility of HTrace to effectively nil for
> > > the
> > > average user. I am sorry to say that. Only a commercial Hadoop vendor
or
> > > power user can be expected to patch and build a stack that actually
> > > works.
> >
> > One correction: The different major versions of HTrace are indeed source
> > code compatible.  You can build an application that can use both HTrace
> > 3 and HTrace 4.  This was absolutely essential for us because of the
> > version skew issues you mention.
> >
> > > On Thu, Aug 17, 2017 at 11:04 AM, lewis john mcgibbney <
> > lewismc@apache.org> wrote:
> > >
> > > > Hi Mike,
> > > > I think this is a fair question. We've probably all been associated
> > with
> > > > projects which just don't really make it. It would appear that
HTrace
> > is
> > > > one of them. This is not to say that there is nothing going on with
the
> > > > tracing effort generally (as there is) but it looks like HTrace as a
> > > > project may be headed to the Attic.
> > > > I suppose the response to this thread will determine what happens...
> >
> > Thanks, Lewis.
> >
> > I think maybe we should try to identify the top tracing priorities for
> > HBase and HDFS and see how HTrace / OpenTracing / OpenZipkin could fit
> > into those.  Just start from a nice crisp set of requirements, like
> > Stack suggested, and think about how we could make those a reality.  If
> > we can advance the state of tracing in hadoop, that will be a good thing
> > for our users, even if htrace goes to the attic.  I've been mostly
> > working on Apache Kafka these days but I could drop by to brainstorm.
> >
> > best,
> > Colin
> >
> >
> > > > Lewis
> > > > ​​
> > > >
> > > >
> > > > On Wed, Aug 16, 2017 at 10:01 AM, <
> > > > dev-digest-help@htrace.incubator.apache.org> wrote:
> > > >
> > > > >
> > > > > From: Mike Drob <mdrob@apache.org>
> > > > > To: dev@htrace.incubator.apache.org
> > > > > Cc:
> > > > > Bcc:
> > > > > Date: Wed, 16 Aug 2017 12:00:49 -0500
> > > > > Subject: [DISCUSS] Attic podling Apache HTrace?
> > > > > Hi folks,
> > > > >
> > > > > Want to bring up a potentially uncofortable topic for some. Is it
> > time to
> > > > > retire/attic the project?
> > > > >
> > > > > We've seen a minimal amount of activity in the past year. The last
> > > > release
> > > > > had two bug fixes, and had been pending for several months before
> > > > somebody
> > > > > reminded me to push the artifacts to subversion from the staging
> > > > directory.
> > > > >
> > > > > I'd love to see a renewed set of activity here, but I don't think
> > there
> > > > is
> > > > > a ton of interest going on.
> > > > >
> > > > > HBase is still on version 3. So is Accumulo, I think. Hadoop is on
> > 4.1,
> > > > > which is a good sign, but I haven't heard much from them
recently. I
> > > > > definitely do no think we are at the point where a lack of
releases
> > and
> > > > > activity is a sign of super advanced maturity and stability.
> > > > >
> > > > > Your thoughts?
> > > > >
> > > > > Mike
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > http://home.apache.org/~lewismc/
> > > > @hectorMcSpector
> > > > http://www.linkedin.com/in/lmcgibbney
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrew
> > >
> > > Words like orphans lost among the crosstalk, meaning torn from truth's
> > > decrepit hands
> > >    - A23, Crosstalk
> >
>
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>    - A23, Crosstalk

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message