hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5276) FileSystem.Statistics got performance issue on multi-thread read/write.
Date Tue, 01 Oct 2013 00:59:24 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782500#comment-13782500
] 

Colin Patrick McCabe commented on HDFS-5276:
--------------------------------------------

bq. The counts from the threads, even though they are not running any more, should be included
in stats count. Currently statistics object is passed from the client to the file system.
This implementation may need incompatible changes.

There's nothing incompatible about it.  The objects used for thread-local storage are not
the same object as the client is passing around.  My point is that, if you keep adding objects
whenever a thread is created, you also have to get rid of them when the thread is destroyed.
 Otherwise, you have a memory leak.

It would be really simple to come up with a patch that does thread-local counters.  I don't
have time today, but maybe later this week.

bq. Controlling issues such as cache alignments, synchronization from JVM are also essential
to avoid contentions. Since the information is simply unavailable to Java programs, in my
personal opinions the problem might be better addressed in the JVM, or even lower abstraction
levels.

The JVM has some problems, but this isn't one of them.  Accessing the same memory from many
different threads at once is inherently slow on modern multicore CPUs because of cache coherency
issues.  It's up to software designers to avoid this if they want the best performance.

> FileSystem.Statistics got performance issue on multi-thread read/write.
> -----------------------------------------------------------------------
>
>                 Key: HDFS-5276
>                 URL: https://issues.apache.org/jira/browse/HDFS-5276
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.4-alpha
>            Reporter: Chengxiang Li
>         Attachments: DisableFSReadWriteBytesStat.patch, HDFSStatisticTest.java, hdfs-test.PNG,
jstack-trace.PNG
>
>
> FileSystem.Statistics is a singleton variable for each FS scheme, each read/write on
HDFS would lead to a AutomicLong.getAndAdd(). AutomicLong does not perform well in multi-threads(let's
say more than 30 threads). so it may cause  serious performance issue. during our spark test
profile, 32 threads read data from HDFS, about 70% cpu time is spent on FileSystem.Statistics.incrementBytesRead().



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message