hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "BOGDAN DRUTU (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15566) Remove HTrace support
Date Mon, 30 Jul 2018 15:27:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562025#comment-16562025
] 

BOGDAN DRUTU commented on HADOOP-15566:
---------------------------------------

Hello all,

First sorry for jumping into this issue, but I will try to be short (edited after I finished
the comment: I was wrong) and as much possible project independent (for the record I am one
of the main contributor in OpenCensus, also in my previous life I debugged a lot of BigTable
issues using the same technology as OpenCensus).

Some comments about other comments in this issue:

[~bensigelman] - FYI: OpenCensus does not enforce any wire format. The format is configurable
and we are adding support for the w3c standard.

[~elek] - About OT vs OC my personal opinion is the philosophy behind these projects, OT was designed
with a mindset of being an open-source API for vendors to implement and because of these certain
tradeoffs were made to help some vendors (as [~michaelsembwever] mentioned), OC was designed
to be a fully implemented library that supports multiple different backend (Zipkin, Jagger,
Stackdriver, AppInsight, etc.) as well as in-process debugging capabilities. For example one
of the key feature that I used a lot when I debugged BigTable issues is what OpenCensus calls
z-pages (in-process handlers to track active requests, in-memory latency based sampled spans,
stats, etc.). You can take a look here [https://opencensus.io/core-concepts/z-pages/#1].

Based on my small experience there are 3 components that are critical in the instrumentation
of a service:
 # Wire propagation (I saw a previous discussion about this). [https://github.com/w3c/distributed-tracing] - it
is a w3c standard proposed by couple of APM vendors and cloud providers. Even though the
format is mostly focus on HTTP requests HBase can define their own format if needed, the only
requirement being the ability to propagate all fields defined in the format (trace-id, span-id,
trace-options and tracestate). This part is critical when HBase is used as a service (e.g.
something like Google Bigtable which works with the HBase client), having standard fields
that are propagated allows service owners to correlate incoming requests from a customer
with the internal trace. Also similar issue may occur when only HDFS is used as a service.
 # APIs to start/end a span, record tracing events, etc. There are multiple open source APIs
including (OpenCensus, OpenTracing, Zipkin, etc.).
 # In-process propagation. This can be implemented in two ways: explicitly propagate the
current "Span" between function calls, runnable, callable, etc. or implicitly usually using
a thread-local mechanism. From a previous comment from [~stack] about keeping this working,
my personal experience is that you can achieve this using the "implicit" mechanism described
before by having a clean context api (for an example of a context api that works good I
can recommend the [https://grpc.io/grpc-java/javadoc/io/grpc/Context.html)] and ensure that
all async calls are wrapped accordingly (e.g wrapping all Executors), the "explicit" mechanism
may be very hard to maintain and based on my experience annoying for developers. This part
is very important when instrumenting the HBase client (which I think should be instrumented
in order to debug more complex issues) because the client is used as a library and a standard
way to propagate the current Span is very important in order to continue the same trace between
client application and bigtable client.

When OpenCensus was designed I thought that it is very important that the library ensures
all these 3 components are covered. Some may say that the 1) it is not important when deployed
internally but with the new cloud providers this becomes more common, others may say that
3) it is not important but when instrument client libraries (like HBase client) this becomes
very important in my opinion. FYI there are other libraries that solve these issues as well
like Zipkin, etc. but I am not here to suggest one particular library, just to explain the
concepts, issues and what is important to think about.

 

In my personal opinion OpenTracing does not deal very well with 1 and 3 (probably on purpose)
but I am not an expert in OpenTracing or one of the owner/author/co-author so I cannot comment
on what is good or what is bad in their design choices.

 

These are my thoughts about what you should consider when you pick one library vs other. Related
to OpenCensus we are happy to help if you have any questions about our design choices, or
about stats/metrics support in OpenCensus and why we think that these are very important as
well.

 

PS: Hope the comment makes sense, it became larger than expected but I tried to give an overview
of the whole instrumentation issue.

> Remove HTrace support
> ---------------------
>
>                 Key: HADOOP-15566
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15566
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: metrics
>    Affects Versions: 3.1.0
>            Reporter: Todd Lipcon
>            Priority: Major
>              Labels: security
>         Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png, ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making further releases.
The Hadoop project currently has various hooks with HTrace. It seems in some cases (eg HDFS-13702)
these hooks have had measurable performance overhead. Given these two factors, I think we
should consider removing the HTrace integration. If there is someone willing to do the work,
replacing it with OpenTracing might be a better choice since there is an active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message