hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8069) Tracing implementation on DFSInputStream seriously degrades performance
Date Wed, 08 Apr 2015 01:54:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484542#comment-14484542

Colin Patrick McCabe commented on HDFS-8069:

bq. Josh wrote: As Billie said, we're not tracing the tracing code .

Thanks for confirming this.  Just to double-check, can you confirm that you have {{hadoop.htrace.sampler}}
set to nothing (the default).

bq. Josh wrote: \[a second cluster is\] A non-starter for me. We've had distributed tracing
support built into Accumulo for years without issue. To suddenly inform users that they need
to spin up a second cluster is a no-go.

Understood.  I think that the configuration you outlined, where {{hadoop.htrace.sampler}}
is set to NeverSampler (or left unset) and all sampling happens at the level of Accumulo,
should work.  We just need to fix the issues that we have currently.

bq. Billie wrote: I think this might be the case \[that HDFS tracing is too chatty\]. Creating
spans for byte array reads of one byte or more effectively makes us unable to trace client
operations if they happen to use DFSInputStream, which we are using to read walogs. Operations
involving Accumulo's RFiles seem to be in better shape since we are reading blocks from them.

I am going to open an issue in HDFS to only trace the cases where we actually fill the buffer
of the HDFS BlockReader.  I think that it's a reasonable tradeoff to make, given that filling
the HDFS BlockReader buffer tends to be the main thing that delays readers from HDFS.  Just
reading a byte from the in-memory buffer that already exists very seldom causes any delay,
if ever.

bq. Billie wrote: We are only tracing one Accumulo operation, but it is a fairly complex operation.
So even if we traced this operation less often, we would still run into this issue

If the Accumlo operation is big enough, it may be necessary to split it into multiple HTrace
spans.  For example, I think tracing an entire compaction would be too big.  We may have to
experiment with this somewhat.

> Tracing implementation on DFSInputStream seriously degrades performance
> -----------------------------------------------------------------------
>                 Key: HDFS-8069
>                 URL: https://issues.apache.org/jira/browse/HDFS-8069
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 2.7.0
>            Reporter: Josh Elser
>            Priority: Critical
> I've been doing some testing of Accumulo with HDFS 2.7.0 and have noticed a serious performance
impact when Accumulo registers itself as a SpanReceiver.
> The context of the test which I noticed the impact is that an Accumulo process reads
a series of updates from a write-ahead log. This is just reading a series of Writable objects
from a file in HDFS. With tracing enabled, I waited for at least 10 minutes and the server
still hadn't read a ~300MB file.
> Doing a poor-man's inspection via repeated thread dumps, I always see something like
the following:
> {noformat}
> "replication task 2" daemon prio=10 tid=0x0000000002842800 nid=0x794d runnable [0x00007f6c7b1ec000]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.concurrent.CopyOnWriteArrayList.iterator(CopyOnWriteArrayList.java:959)
>         at org.apache.htrace.Tracer.deliver(Tracer.java:80)
>         at org.apache.htrace.impl.MilliSpan.stop(MilliSpan.java:177)
>         - locked <0x000000077a770730> (a org.apache.htrace.impl.MilliSpan)
>         at org.apache.htrace.TraceScope.close(TraceScope.java:78)
>         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:898)
>         - locked <0x000000079fa39a48> (a org.apache.hadoop.hdfs.DFSInputStream)
>         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:697)
>         - locked <0x000000079fa39a48> (a org.apache.hadoop.hdfs.DFSInputStream)
>         at java.io.DataInputStream.readByte(DataInputStream.java:265)
>         at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
>         at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
>         at org.apache.accumulo.core.data.Mutation.readFields(Mutation.java:951)
>        ... more accumulo code omitted...
> {noformat}
> What I'm seeing here is that reading a single byte (in WritableUtils.readVLong) is causing
a new Span creation and close (which includes a flush to the SpanReceiver). This results in
an extreme amount of spans for {{DFSInputStream.byteArrayRead}} just for reading a file from
HDFS -- over 700k spans for just reading a few hundred MB file.
> Perhaps there's something different we need to do for the SpanReceiver in Accumulo? I'm
not entirely sure, but this was rather unexpected.
> cc/ [~cmccabe]

This message was sent by Atlassian JIRA

View raw message