hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billie Rinaldi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8069) Tracing implementation on DFSInputStream seriously degrades performance
Date Mon, 06 Apr 2015 23:51:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482224#comment-14482224
] 

Billie Rinaldi commented on HDFS-8069:
--------------------------------------

We aren't tracing in the span collector.  We are only tracing one Accumulo operation, but
it is a fairly complex operation.  So even if we traced this operation less often, we would
still run into this issue.  I'm not sure I understand how the DFSInputStream tracing is supposed
to be used.  Is it possible introduce sampling of DFSInputStream read operations within a
current trace that has been enabled?  I'm also confused about why it would create a span for
a single read operation, which is usually just pulling some bytes from an in-memory buffer
(?), rather than only creating spans in the BlockReaders.

> Tracing implementation on DFSInputStream seriously degrades performance
> -----------------------------------------------------------------------
>
>                 Key: HDFS-8069
>                 URL: https://issues.apache.org/jira/browse/HDFS-8069
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 2.7.0
>            Reporter: Josh Elser
>            Priority: Critical
>
> I've been doing some testing of Accumulo with HDFS 2.7.0 and have noticed a serious performance
impact when Accumulo registers itself as a SpanReceiver.
> The context of the test which I noticed the impact is that an Accumulo process reads
a series of updates from a write-ahead log. This is just reading a series of Writable objects
from a file in HDFS. With tracing enabled, I waited for at least 10 minutes and the server
still hadn't read a ~300MB file.
> Doing a poor-man's inspection via repeated thread dumps, I always see something like
the following:
> {noformat}
> "replication task 2" daemon prio=10 tid=0x0000000002842800 nid=0x794d runnable [0x00007f6c7b1ec000]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.concurrent.CopyOnWriteArrayList.iterator(CopyOnWriteArrayList.java:959)
>         at org.apache.htrace.Tracer.deliver(Tracer.java:80)
>         at org.apache.htrace.impl.MilliSpan.stop(MilliSpan.java:177)
>         - locked <0x000000077a770730> (a org.apache.htrace.impl.MilliSpan)
>         at org.apache.htrace.TraceScope.close(TraceScope.java:78)
>         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:898)
>         - locked <0x000000079fa39a48> (a org.apache.hadoop.hdfs.DFSInputStream)
>         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:697)
>         - locked <0x000000079fa39a48> (a org.apache.hadoop.hdfs.DFSInputStream)
>         at java.io.DataInputStream.readByte(DataInputStream.java:265)
>         at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
>         at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329)
>         at org.apache.accumulo.core.data.Mutation.readFields(Mutation.java:951)
>        ... more accumulo code omitted...
> {noformat}
> What I'm seeing here is that reading a single byte (in WritableUtils.readVLong) is causing
a new Span creation and close (which includes a flush to the SpanReceiver). This results in
an extreme amount of spans for {{DFSInputStream.byteArrayRead}} just for reading a file from
HDFS -- over 700k spans for just reading a few hundred MB file.
> Perhaps there's something different we need to do for the SpanReceiver in Accumulo? I'm
not entirely sure, but this was rather unexpected.
> cc/ [~cmccabe]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message