htrace-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin McCabe <cmcc...@apache.org>
Subject Re: Problematic Trace in Multithreaded Environment
Date Fri, 28 Apr 2017 18:29:36 GMT
On Fri, Apr 28, 2017, at 10:54, Mike Miller wrote:
> Hello Apache Friends,
> 
> I am trying to track down a possible issue that Accumulo is having with a
> test in our Shell code that does some tracing.  The test is
> ShellServerIT.trace
> <https://github.com/apache/accumulo/blob/master/test/src/main/java/org/apache/accumulo/test/ShellServerIT.java#L1612>
> and
> it will periodically fail looking for the trace "sendMutations" on line
> 1624.
> 
> I can't figure out if this is an issue with Htrace version
> 3.1.0-incubation
> or something crazy we are doing with multithreading.  When the test is
> run
> through Maven, the trace seems to always be unrooted:
> The following spans are not rooted (probably due to a parent span of
> length
> 0ms):
>     2+136    master@0.0.0.0 PopulateMetadata
>     1+137    tserver@0.0.0.0 wal
>     1+137    tserver@0.0.0.0 update
>     4+140    master@0.0.0.0 FinishCreateTable
>     1+1340   master@0.0.0.0
> org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter
> 1
>     1+1340   master@0.0.0.0 sendMutations
> 
> But when I perform the same trace through the Accumulo shell, entering
> the
> commands manually, it seems to work OK:
>     5+33230          master@0.0.0.0 close
>     1+33230            master@0.0.0.0 BinMutations 1
>     4+33231              master@0.0.0.0
> org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter
> 1
>     4+33231                master@0.0.0.0 sendMutations
> 
> The code that is doing the trace is in our TabletServerBatchWriter here
> <https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java#L858>
> .
> 
> I have thought of two possible reasons we are having problems...
> 1 - We are tracing in 2 separate thread pools and htrace gets confused:
> In org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java
> BinMutations trace:
> binningThreadPool.execute(Trace.wrap(new Runnable() {
> And sendMutations trace:
> sendThreadPool.submit(Trace.wrap(new SendTask(server)));
> 
> 2 - We change the name of the current thread just before starting the
> "sendMutations" trace:
> In TabletServerBatchWriter.SendTask.send()
> <https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchWriter.java#L856>
> 
> 
> Any ideas/insight would be greatly appreciated.  Thank you!
> - Mike

Hi Mike,

Changes in thread name will not be an issue, because HTrace uses
thread-local data to maintain its internal state.  Similarly, parent
spans with length 0, although not great to look at, still do count as
parents.

I think what trips up a lot of people tracing with thread pools is
failing to close traces.  Take a look and make sure you are not
experiencing this too.  In later versions of HTrace, we added warnings
for this kind of thing when we detected it.  Another thing to look at
would be which actual threads are performing which operations -- i.e. if
there are "handoffs" between threads, do the spans get detached and
reattached too?

best,
Colin

Mime
View raw message