hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8213) DFSClient should not instantiate SpanReceiverHost
Date Wed, 22 Apr 2015 18:46:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507643#comment-14507643

Colin Patrick McCabe commented on HDFS-8213:

Thanks again for kicking the tires on htrace, [~billie.rinaldi].  Let me see if I can get
to the bottom of this.

bq. As documented, each process must configure its own span receivers if it wants to use tracing.
If I set hadoop.htrace.span.receiver.classes to the empty string, then the NameNode and DataNode
will not do any tracing.

You are right that you need to set {{hadoop.htrace.span.receiver.classes}} in the NameNode
and DataNode configuration.  However, you need to avoid setting it in the Accumulo configuration...
instead, use whatever configuration Accumulo uses to set this value.  This means that you
can't use the same config file for the NN and DN as for the DFSClient, currently.

bq. If span receiver initialization in DFSClient is important to the use of the hadoop.htrace.sampler
configuration property, perhaps a compromise would be to perform SpanReceiverHost.getInstance
only when the sampler is set to something other than NeverSampler.

Keep in mind that {{hadoop.htrace.sampler}} is a completely different configuration key than
{{hadoop.htrace.span.receiver.classes}}.  If you are sampling at the level of Accumulo operations,
I would not recommend setting {{hadoop.htrace.sampler}}, in any config file on the cluster.
 You want all of the sampling to happen inside accumulo.

bq. I think Billie Rinaldi is correct here; the client should not instantiate it's own SpanReceiverHost,
but instead depend on the process in which it resides to provide. This is how HBase client
works as well.

HBase is exactly the same.  In the case of HBase, you do not want to set {{hadoop.htrace.span.receiver.classes}}
in the HBase config files.  Instead, you would set {{hbase.htrace.span.receiver.classes}}.
 Then HBase would create a span receiver, and DFSClient would not.

It seems like there is a hidden assumption here that you want to use the same config file
for everything.  But we really don't support that right now.

Getting rid of the SpanReceiverHost in DFSClient is not an option since some people want to
just trace HDFS without tracing any other system.  Plus, it just kicks the problem up to a
higher level.  If my FooProcess wants to use both HTrace and Accumulo, FooProcess could easily
make the same argument that "Accumulo should not instantiate SpanReceiverHost" since FooProcess
is already doing that.  And since FooProcess uses the accumulo client, it would conflict with
whatever accumulo was configuring, if the same config file was used for everything.

One thing we could do to make this a little less painful is to deduplicate span receivers
inside the library.  So if both DFSClient and Accumlo requested an HTracedSpanReceiver, we
could simply create one instance of that.  This would allow us to use the same config file
for everything.

As a side note, [~billie.rinaldi], can you explain how you configure which sampler and span
receiver accumulo uses?  In HBase we set it to {{hbase.htrace.span.receiver.classes}}, etc.
 I would recommend something like {{accumulo.htrace.span.receiver.classes}} for consistency.
 This also allows you to sue the same config file for everything since it doesn't conflict
with the keys which Hadoop uses to set these values.  That is the reason why we set up the
"hbase.htrace" "namespace" as separate from the "hadoop.htrace" "namespace" if you see what
I'm saying.

> DFSClient should not instantiate SpanReceiverHost
> -------------------------------------------------
>                 Key: HDFS-8213
>                 URL: https://issues.apache.org/jira/browse/HDFS-8213
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Billie Rinaldi
>            Assignee: Brahma Reddy Battula
>            Priority: Critical
> DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers
through its own configuration.  This results in the same receivers being registered multiple
times and spans being delivered more than once.  The documentation says SpanReceiverHost.getInstance
should be issued once per process, so there is no expectation that DFSClient should do this.

This message was sent by Atlassian JIRA

View raw message