accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: questions regarding accumulo tracing
Date Thu, 13 Aug 2015 19:34:39 GMT
Jeff Kubina wrote:
> On Thu, Aug 13, 2015 at 2:52 PM, Josh Elser <
> <>> wrote:
>         1. Regarding the information above about accumulo tracing, if
>         more than
>         one server is listed in $ACCUMULO_HOME/conf/tracers how do the
>         clients
>         select the trace server to send their trace data to?
>     Tracers register themselves in ZooKeepers and the client tracing
>     libraries know to look in ZooKeeper to find them. You as a user
>     shouldn't have to worry about it -- it should happen automagically
>     for you.
> I wanted to know how well balanced the tracing data is processed.
> Is there a recommended system design with respect to the tracing
> servers? Should we dedicate a few nodes to being just tracing servers or
> is it best to have each tablet server also be a tracing server? If we
> make each tablet server also a tracing server will each tablet server
> just send its tracing data to the local tracing server?

Of the available trace servers, they are chosen at random per trace. 
Clients will cache the available trace server, and then as a new trace 
comes into, it will chose one of those hosts.

If you're just using Accumulo's tracing, I think one server goes a very 
very long way. If you're sending client traces (or have custom 
applications also using it), you may want to add more. I don't have a 
good way to quantify it, sorry.

>         2. As an admin what is the best way to determine which tables have
>         recently been traced?
>     I'm not entirely sure what you mean by "[tables that have been
>     recently traced]". You can look at the "Recent Traces" page on the
>     monitor to get a list of the traces in the last X minutes.
>     Many operations going on in Accumulo will be getting traced. If you
>     have an active system, you'll constantly see new traces for minor
>     compactions and major compactions.
> Sometimes a trace will cause very high system CPU utilization (90%) and
> system load on the tracing server. When this becomes detrimental to the
> server I would like to determine what table was being traced at that
> time (to get the user/developer to refine the trace).

Traces are tied to a specific table, perhaps that's where the confusion 
is coming in. A trace is just _an operation_. If I have a client, I 
could just want to time some general operation. Or, like I mentioned 
before, maybe it's a compaction in a TabletServer.

I think that the traces include the client information (IP addr). Is 
that sufficient for your case? If you have a collection of users sending 
traces, you could consider enforcing that they all provide some 
attribute on traces which includes some easily-identifiable information too.

View raw message