hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Binglin Chang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5276) FileSystem.Statistics got performance issue on multi-thread read/write.
Date Tue, 01 Oct 2013 14:50:25 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13783009#comment-13783009
] 

Binglin Chang commented on HDFS-5276:
-------------------------------------

bq. Why not keep thread-local read statistics and sum them up periodically? That seems better
than disabling this entirely.
ThreadLocal variables also has performance penalties in java, although I have not test it,
see http://stackoverflow.com/questions/609826/performance-of-threadlocal-variable. Use them
frequently in inner loop may also cause performance penalty
Since atomic variable or ThreadLocal both have performance impact(big or small), and most
applications use hdfs client donot use statistics at all, I think at least we can give them
a option to disable it. We can also do optimizations, they are not conflict.

 Hadoop fs client is too heavyweight now, with to much threads and states. Imagine a NM/TaskTracker
with 40+ of tasks, each with several hdfs clients which has multiple threads, we may get thousand
threads just for hdfs read/write, it will cause a lot of context switch expenses. 


> FileSystem.Statistics got performance issue on multi-thread read/write.
> -----------------------------------------------------------------------
>
>                 Key: HDFS-5276
>                 URL: https://issues.apache.org/jira/browse/HDFS-5276
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.4-alpha
>            Reporter: Chengxiang Li
>         Attachments: DisableFSReadWriteBytesStat.patch, HDFSStatisticTest.java, hdfs-test.PNG,
jstack-trace.PNG
>
>
> FileSystem.Statistics is a singleton variable for each FS scheme, each read/write on
HDFS would lead to a AutomicLong.getAndAdd(). AutomicLong does not perform well in multi-threads(let's
say more than 30 threads). so it may cause  serious performance issue. during our spark test
profile, 32 threads read data from HDFS, about 70% cpu time is spent on FileSystem.Statistics.incrementBytesRead().



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message