Mailing-List: contact dev-help@htrace.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@htrace.incubator.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CA+qbEUO3TZVAnvNce5dK8FkGW9eBvSAupA_Ot4pNU0CM-fdK5g@mail.gmail.com>
References: 
 <2308_1438022041_t6RIY0Q1003658_CAJQBSuyK=RF_xATKuVH88UE8D8uQxvoCVaEU4h=Gzy5134uN6Q@mail.gmail.com>
 <CA+qbEUO3TZVAnvNce5dK8FkGW9eBvSAupA_Ot4pNU0CM-fdK5g@mail.gmail.com>
From: Daniel Lee <daniel@slice.com>
Date: Mon, 27 Jul 2015 14:32:30 -0700
Message-ID: 
 <CAJQBSuyJTziE7yZFKe_y5qvSK9eSN4Zh8tmbw-juuadb4nxXsQ@mail.gmail.com>
Subject: Re: HTRACE-215 Simplify the Sampler type - discussion
To: dev@htrace.incubator.apache.org
Content-Type: text/plain; charset=UTF-8

Hi Colin,

I'm not sure how Hadoop tracing is setup but I also enable tracing via
a config setting.

I'm not sure I agree creating multiple new Tracer objects each with
their own Probability samplers is an acceptable solution from a
usability standpoint. Consider an application that receives messages
from clients and wants to trace different message and client types
with different probabilities. Now, for every tuple of (message,
client) type there has to be a new Tracer and Sampler created so this
gets ugly quickly. It also sounds like having multiple tracers could
get confusing quickly under this scenario. I'm just going to wrap
everything in a custom class that includes the logic I used to have in
the Sampler.

Thanks,
Daniel

On Mon, Jul 27, 2015 at 12:00 PM, Colin P. McCabe <cmccabe@apache.org> wrote:
> Hi Daniel,
>
> The problem with the "T" in Sampler<T> is that it's
> application-specific.  The code for each application needs to be
> modified specifically to make use of a different T.  Ideally, Samplers
> should be pluggable, so that you can use any sampler with any HTraced
> code.  For example, I might run a test application with sampling set
> to "always" but in production, I would run with a probability sampler
> with some specific sampling rate.  But you can't do that when your
> sampler depends on being passed some application-specific data.
> You're stuck with only samplers that can work with that specific T.
>
> Consider a specific example: tracing Hadoop.  I'd like to be able to
> turn on tracing in Hadoop just by changing a config key.  But if I'm
> using a Sampler<T> with a non-trivial T, I can't do that.  I have to
> tell the customer, "first apply this patch to your Hadoop code to add
> the Ts, do a full build, and then put it into live production"...  The
> customer won't even follow me to step #1, let alone deploying the
> patched code in production.  It totally wrecks the usefulness of
> HTrace if you need to rebuild your code to use it.
>
> Another thing to think about is that we'd like to reduce the
> "boilerplate code" needed to add HTrace to an application.  Ideally
> the system would create the samplers you need from your
> HTraceConfiguration, rather than requiring the application to create
> and manage them manually.  Of course, applications should be able to
> programmatically add and remove Samplers as well, but only if they
> have a specific need to do that.
>
> I think that tracing different events with different probabilities is
> a nice feature.  There is a way to do that through the new API that I
> think is cleaner.  You would create multiple Tracer objects (Tracer
> will no longer be a singleton).  Each tracer would be configured with
> ProbabilitySampler, but they would have a different sampling rate set.
> For the Foo code, you would call fooTracer.newTopLevelSpan(...), for
> the Bar code, you would call barTracer.newTopLevelSpan(...), and so
> forth.  In the new API, spans are always created from a specific
> Tracer and use the Samplers associated with that Tracer.
>
> This is similar to having different Log objects in log4j.  Perhaps you
> think the Foo system is not that interesting most of the time, so its
> log level defaults to WARN.  But if you think you're having a problem
> in the Foo system, you can set its log level to TRACE and then you see
> all the log messages that the Foo system has.  Same thing here, except
> that instead of Log objects, we have Tracer objects.  Instead of log
> messages, we have trace spans.  But we still have a lot of flexibility
> at runtime as a result of this.  And we don't need to recompile to
> trace.
>
> regards,
> Colin
>
> On Mon, Jul 27, 2015 at 11:33 AM, Daniel Lee <daniel@slice.com> wrote:
>> RE: https://issues.apache.org/jira/browse/HTRACE-215
>>
>> I was previously making use of this feature. I was using it to trace
>> different types of inputs with different probabilities. It looks like
>> now I'll either have move all tracing logic completely outside of
>> htrace related classes and only use Always and Never sampler which
>> seems weird? Why even bother with providing ProbabilitySampler when
>> (rand.nextDouble() < X ? AlwaysSampler.INSTANCE :
>> NeverSampler.INSTANCE) is available.
>>
>> Daniel