Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F12FFF6A0 for ; Tue, 1 Oct 2013 14:50:26 +0000 (UTC) Received: (qmail 39312 invoked by uid 500); 1 Oct 2013 14:50:26 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 39282 invoked by uid 500); 1 Oct 2013 14:50:26 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 39252 invoked by uid 99); 1 Oct 2013 14:50:25 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Oct 2013 14:50:25 +0000 Date: Tue, 1 Oct 2013 14:50:25 +0000 (UTC) From: "Binglin Chang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-5276) FileSystem.Statistics got performance issue on multi-thread read/write. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13783009#comment-13783009 ] Binglin Chang commented on HDFS-5276: ------------------------------------- bq. Why not keep thread-local read statistics and sum them up periodically? That seems better than disabling this entirely. ThreadLocal variables also has performance penalties in java, although I have not test it, see http://stackoverflow.com/questions/609826/performance-of-threadlocal-variable. Use them frequently in inner loop may also cause performance penalty Since atomic variable or ThreadLocal both have performance impact(big or small), and most applications use hdfs client donot use statistics at all, I think at least we can give them a option to disable it. We can also do optimizations, they are not conflict. Hadoop fs client is too heavyweight now, with to much threads and states. Imagine a NM/TaskTracker with 40+ of tasks, each with several hdfs clients which has multiple threads, we may get thousand threads just for hdfs read/write, it will cause a lot of context switch expenses. > FileSystem.Statistics got performance issue on multi-thread read/write. > ----------------------------------------------------------------------- > > Key: HDFS-5276 > URL: https://issues.apache.org/jira/browse/HDFS-5276 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.0.4-alpha > Reporter: Chengxiang Li > Attachments: DisableFSReadWriteBytesStat.patch, HDFSStatisticTest.java, hdfs-test.PNG, jstack-trace.PNG > > > FileSystem.Statistics is a singleton variable for each FS scheme, each read/write on HDFS would lead to a AutomicLong.getAndAdd(). AutomicLong does not perform well in multi-threads(let's say more than 30 threads). so it may cause serious performance issue. during our spark test profile, 32 threads read data from HDFS, about 70% cpu time is spent on FileSystem.Statistics.incrementBytesRead(). -- This message was sent by Atlassian JIRA (v6.1#6144)