hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDDS-1636) Tracing id is not propagated via async datanode grpc call
Date Mon, 03 Jun 2019 12:59:00 GMT

     [ https://issues.apache.org/jira/browse/HDDS-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ASF GitHub Bot updated HDDS-1636:
---------------------------------
    Labels: pull-request-available  (was: )

> Tracing id is not propagated via async datanode grpc call
> ---------------------------------------------------------
>
>                 Key: HDDS-1636
>                 URL: https://issues.apache.org/jira/browse/HDDS-1636
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>            Priority: Major
>              Labels: pull-request-available
>
> Recently a new exception become visible in the datanode logs, using standard freon (STANDLAONE)
> {code}
> datanode_2  | 2019-06-03 12:18:21 WARN  PropagationRegistry$ExceptionCatchingExtractorDecorator:60
- Error when extracting SpanContext from carrier. Handling gracefully.
> datanode_2  | io.jaegertracing.internal.exceptions.MalformedTracerStateStringException:
String does not match tracer state format: 7576cabf-37a4-4232-9729-939a3fdb68c4WriteChunk150a8a848a951784256ca0801f7d9cf8b_stream_ed583cee-9552-4f1a-8c77-63f7d07b755f_chunk_1
> datanode_2  | 	at org.apache.hadoop.hdds.tracing.StringCodec.extract(StringCodec.java:49)
> datanode_2  | 	at org.apache.hadoop.hdds.tracing.StringCodec.extract(StringCodec.java:34)
> datanode_2  | 	at io.jaegertracing.internal.PropagationRegistry$ExceptionCatchingExtractorDecorator.extract(PropagationRegistry.java:57)
> datanode_2  | 	at io.jaegertracing.internal.JaegerTracer.extract(JaegerTracer.java:208)
> datanode_2  | 	at io.jaegertracing.internal.JaegerTracer.extract(JaegerTracer.java:61)
> datanode_2  | 	at io.opentracing.util.GlobalTracer.extract(GlobalTracer.java:143)
> datanode_2  | 	at org.apache.hadoop.hdds.tracing.TracingUtil.importAndCreateScope(TracingUtil.java:102)
> datanode_2  | 	at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148)
> datanode_2  | 	at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:73)
> datanode_2  | 	at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:61)
> datanode_2  | 	at org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:248)
> datanode_2  | 	at org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
> datanode_2  | 	at org.apache.ratis.thirdparty.io.grpc.Contexts$ContextualizedServerCallListener.onMessage(Contexts.java:76)
> datanode_2  | 	at org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
> datanode_2  | 	at org.apache.hadoop.hdds.tracing.GrpcServerInterceptor$1.onMessage(GrpcServerInterceptor.java:46)
> datanode_2  | 	at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:263)
> datanode_2  | 	at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:686)
> datanode_2  | 	at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> datanode_2  | 	at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
> datanode_2  | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> datanode_2  | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> {code}
> It turned out that the tracingId propagation between XCeiverClient and Server doesn't
work very well (in case of Standalone and async commands)
>  1. there are many places (on the client side) where the traceId filled with  UUID.randomUUID().toString();
 
>  2. This random id is propagated between the Output/InputStream and different part of
the clients
>  3. It is unnecessary, because in the XceiverClientGrpc and XceiverClientGrpc the traceId
field is overridden with the real opentracing id anyway (sendCommand/sendCommandAsync)
>  4. Except in the XceiverClientGrpc.sendCommandAsync where this part is accidentally
missing.
> Things to fix:
>  1. fix XceiverClientGrpc.sendCommandAsync (replace any existing traceId with the good
one)
>  2. remove the usage of the UUID based traceId (it's not used)
>  3. Improve the error logging in case of an invalid traceId on the server side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message