hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Work logged] (HDDS-1636) Tracing id is not propagated via async datanode grpc call
Date Tue, 04 Jun 2019 17:37:00 GMT

     [ https://issues.apache.org/jira/browse/HDDS-1636?focusedWorklogId=253867&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253867
]

ASF GitHub Bot logged work on HDDS-1636:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Jun/19 17:36
            Start Date: 04/Jun/19 17:36
    Worklog Time Spent: 10m 
      Work Description: xiaoyuyao commented on pull request #895: HDDS-1636. Tracing id is
not propagated via async datanode grpc call
URL: https://github.com/apache/hadoop/pull/895#discussion_r290413345
 
 

 ##########
 File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/tracing/TracingUtil.java
 ##########
 @@ -99,7 +103,16 @@ public static Scope importAndCreateScope(String name, String encodedParent)
{
     if (encodedParent != null && encodedParent.length() > 0) {
       StringBuilder builder = new StringBuilder();
       builder.append(encodedParent);
-      parentSpan = tracer.extract(StringCodec.FORMAT, builder);
+      try {
+        parentSpan = tracer.extract(StringCodec.FORMAT, builder);
+      } catch (Exception ex) {
+        if (LOG.isDebugEnabled()) {
+          LOG.debug("Can't extract tracing from the message.", ex);
+        } else {
+          LOG.warn(
 
 Review comment:
   Can we use a  log throttler for this WARN message?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 253867)
    Time Spent: 50m  (was: 40m)

> Tracing id is not propagated via async datanode grpc call
> ---------------------------------------------------------
>
>                 Key: HDDS-1636
>                 URL: https://issues.apache.org/jira/browse/HDDS-1636
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Recently a new exception become visible in the datanode logs, using standard freon (STANDLAONE)
> {code}
> datanode_2  | 2019-06-03 12:18:21 WARN  PropagationRegistry$ExceptionCatchingExtractorDecorator:60
- Error when extracting SpanContext from carrier. Handling gracefully.
> datanode_2  | io.jaegertracing.internal.exceptions.MalformedTracerStateStringException:
String does not match tracer state format: 7576cabf-37a4-4232-9729-939a3fdb68c4WriteChunk150a8a848a951784256ca0801f7d9cf8b_stream_ed583cee-9552-4f1a-8c77-63f7d07b755f_chunk_1
> datanode_2  | 	at org.apache.hadoop.hdds.tracing.StringCodec.extract(StringCodec.java:49)
> datanode_2  | 	at org.apache.hadoop.hdds.tracing.StringCodec.extract(StringCodec.java:34)
> datanode_2  | 	at io.jaegertracing.internal.PropagationRegistry$ExceptionCatchingExtractorDecorator.extract(PropagationRegistry.java:57)
> datanode_2  | 	at io.jaegertracing.internal.JaegerTracer.extract(JaegerTracer.java:208)
> datanode_2  | 	at io.jaegertracing.internal.JaegerTracer.extract(JaegerTracer.java:61)
> datanode_2  | 	at io.opentracing.util.GlobalTracer.extract(GlobalTracer.java:143)
> datanode_2  | 	at org.apache.hadoop.hdds.tracing.TracingUtil.importAndCreateScope(TracingUtil.java:102)
> datanode_2  | 	at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148)
> datanode_2  | 	at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:73)
> datanode_2  | 	at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:61)
> datanode_2  | 	at org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:248)
> datanode_2  | 	at org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
> datanode_2  | 	at org.apache.ratis.thirdparty.io.grpc.Contexts$ContextualizedServerCallListener.onMessage(Contexts.java:76)
> datanode_2  | 	at org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
> datanode_2  | 	at org.apache.hadoop.hdds.tracing.GrpcServerInterceptor$1.onMessage(GrpcServerInterceptor.java:46)
> datanode_2  | 	at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:263)
> datanode_2  | 	at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:686)
> datanode_2  | 	at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> datanode_2  | 	at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
> datanode_2  | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> datanode_2  | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> {code}
> It turned out that the tracingId propagation between XCeiverClient and Server doesn't
work very well (in case of Standalone and async commands)
>  1. there are many places (on the client side) where the traceId filled with  UUID.randomUUID().toString();
 
>  2. This random id is propagated between the Output/InputStream and different part of
the clients
>  3. It is unnecessary, because in the XceiverClientGrpc and XceiverClientGrpc the traceId
field is overridden with the real opentracing id anyway (sendCommand/sendCommandAsync)
>  4. Except in the XceiverClientGrpc.sendCommandAsync where this part is accidentally
missing.
> Things to fix:
>  1. fix XceiverClientGrpc.sendCommandAsync (replace any existing traceId with the good
one)
>  2. remove the usage of the UUID based traceId (it's not used)
>  3. Improve the error logging in case of an invalid traceId on the server side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message