hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei-Chiu Chuang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15566) Remove HTrace support
Date Mon, 02 Jul 2018 16:28:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530144#comment-16530144
] 

Wei-Chiu Chuang commented on HADOOP-15566:
------------------------------------------

Hi Ben!
With the help from [~tlipcon], I worked with [~fabbri] and [~rizaon] and spent a day or two
on porting htrace to opentracing. It turns out to be a quite fun exercise.

Most of the porting is mechanical, changing htrace span to opentracing span; took me a while
to figure out how to pass trace id in opentracing, but doable. I was even able to add a few
more tracing code that was lacking before.

Some observation I have:
# porting the code in Hadoop seems straightforward.
# I am not aware of any one using htrace in production. So I don't expect too much resistance
in replacing it. (Shout out if this is not the case)
# By embracing opentracing, which is becoming the de facto tracing standard, it makes it possible
to trace end-to-end, from non-Hadoop applications into Hadoop.

Some possible hurdles
# To pass trace id around, we'll need to update client -> namenode RPC messages, as well
as client -> datanode RPC, KMS Rest API. So wire compatibility needs to be considered.
(Some messages already carries htrace trace id. Would it make sense to replace the htrace
trace id field with opentracing trace id field? Or should the opentracing trace id be appended?
Hopefully there's not much overhead)
# opentracing is just a set of APIs. We used Jaeger as the implementation. I can see people
might want an implementation that is more neutral, For example, Jaeger comes from Uber, and
people might not want to use it (hey, any Lyft developers here? :))
# Community adoption: I am aware Hbase uses Htrace. So if we switch to opentracing, there'll
need some coordination to convince HBase community to switch too (I'd be happy to contribute).
And I am hoping to convince other communities to adopt opentracing as well. It's not too interesting
if opentracing is adopted in Hadoop but not in Hive or Spark or Kafka.

Thoughts?

> Remove HTrace support
> ---------------------
>
>                 Key: HADOOP-15566
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15566
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: metrics
>    Affects Versions: 3.1.0
>            Reporter: Todd Lipcon
>            Priority: Major
>
> The HTrace incubator project has voted to retire itself and won't be making further releases.
The Hadoop project currently has various hooks with HTrace. It seems in some cases (eg HDFS-13702)
these hooks have had measurable performance overhead. Given these two factors, I think we
should consider removing the HTrace integration. If there is someone willing to do the work,
replacing it with OpenTracing might be a better choice since there is an active community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message