htrace-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roberto Attias <>
Subject HTrace API comments
Date Sun, 11 Sep 2016 03:04:57 GMT
Hello,I have some comment/concerns regarding the HTrace API, and was wondering whether extensions/changes
would be considered. I'm listing the most important here, if there is interest we can discuss
more in detail.

1) From the HTrace Developer Guide: 

TraceScope objects manage the lifespan of Span objects. When a TraceScope is created, it often
comes with an associated Span object. When this scope is closed, the Span will be closed as
well. “Closing” the scope means that the span is sent to a SpanReceiver for processing.

One of the implications of this model is the fact that nested spans (for example instrumenting
nested function calls) will be delivered to the receiver in reverse order (as the innermost
function completes before the outermost. This may introduce more complexity on the logic in
the span receiver. 

Also, the fact that information about a span is not delivered until the span is closed, relies
on the program not terminating abruptly. In Java this is not so much of a problem, but in
C what happens if a series of nested function calls is instrumented with spans, and the innermost
function crashes? As far as I can tell none of the span is delivered. This makes the use of
the tracing API unreliable for bug analysis.

Would you consider a change where each API call produces at least one event sent to the SpanReceiver?

2) HTrace has a concept of spans having one or more parents.  This allows, for example, to
capture the fact that a process makes an RPC call to another.  However, there is no information
about when within the span the caller calls the callee. A caller span may have two child spans,
representing the fact that it made two RPC calls, but the order in which those were made is
lost in the model (using the timestamps associated to the begin of the callee spans is not
feasible, as there may be different RPC latencies, or simply the clocks may not be aligned.
Also, the only relation captured by the API is between blocks. 

I propose a more general API with a concept of spans and  points (timestamped sets of annotations),
and cause-effect relationship among points. an RPC call can be represented as a point in the
caller span marked as cause, and a  (begin) point in the callee span marked as effect. This
is very flexible and allow to capture all sorts of relationship, not just parent child. for
example, a DMA operation may be initiated in a block  and captured as a point, the completion
captured as a point in a distinct block in the same entity (an abstraction for a unit of concurrency)

3) there doesn't seem to be any provision in the HTrace API for considering clock domains.
In a distributed system, there may be processes running on the same host, processes running
in the same cluster, process running in different clusters. Different domain may have different
degrees of clock mis-alignment. Providing indications of this information in the API allows
the backend or UI trace building to make more accurate inferences on how concurrent entities
line up.
4) does the API provide a mechanism for creating "delegated traces"? what I mean by this is
that in some circumstances  some thread may need to create traces on behalf of some other
element which may not have such capabilty. For example, a mobile device may have some custom
tracing mechanism, and attach the information to a request for the server. The server would
then need to create the HTrace trace from the existing data passed in the request (including
Let me know if there is interest in discussing changes at this level.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message