Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 23228 invoked from network); 10 Aug 2010 18:49:41 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 10 Aug 2010 18:49:41 -0000 Received: (qmail 75555 invoked by uid 500); 10 Aug 2010 18:49:41 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 75496 invoked by uid 500); 10 Aug 2010 18:49:41 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 75487 invoked by uid 99); 10 Aug 2010 18:49:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Aug 2010 18:49:41 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Aug 2010 18:49:38 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o7AInHer029582 for ; Tue, 10 Aug 2010 18:49:17 GMT Message-ID: <3848860.259311281466157309.JavaMail.jira@thor> Date: Tue, 10 Aug 2010 14:49:17 -0400 (EDT) From: "Hong Tang (JIRA)" To: hdfs-issues@hadoop.apache.org Subject: [jira] Commented: (HDFS-1338) Improve TestDFSIO In-Reply-To: <8732165.255241281458477153.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896997#action_12896997 ] Hong Tang commented on HDFS-1338: --------------------------------- I think the goal of TestDFSIO is to benchmark the peak HDFS throughput under typical MR usage pattern. This means: - Files should be replicated. - Files should be spread to nodes relatively evenly. (Run one map per node on the cluster, and writes out data evenly.) - Locality information should be exposed to the MR framework correctly. (Should just use FileInputFormat instead of writing a side file.) - The amount of dataset should not fit in OS buffer cache. (Configure the benchmark such that total amount of data > total RAM). - Throughput should be aggregated as a time series and we should ignore the ramp up and cool down phase of the execution. (Output of each map should be time series of counters of bytes read so far. The reporting may calculate the max and average of the mid-1/3 of the time series). - We should minimize the variations of MR scheduling. (Run one wave of maps, increase block size so that each map runs in at least 20 to 30 seconds). > Improve TestDFSIO > ----------------- > > Key: HDFS-1338 > URL: https://issues.apache.org/jira/browse/HDFS-1338 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Arun C Murthy > > Currently the read test in TestDFSIO benchmark just opens a large side file and measures the read performance. The MR scheduler has no opportunity to do *any* optimization for the TestDFSIO MR application. The side-effect of this is that it is *very* hard to do any meaningful analysis of the results of the benchmark i.e. to check if node-local or rack-local or off-switch read performance improved/degraded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.