Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5CB2915D2 for ; Tue, 26 Apr 2011 23:47:42 +0000 (UTC) Received: (qmail 66535 invoked by uid 500); 26 Apr 2011 23:47:42 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 66466 invoked by uid 500); 26 Apr 2011 23:47:42 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 66458 invoked by uid 99); 26 Apr 2011 23:47:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Apr 2011 23:47:42 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Apr 2011 23:47:40 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 4FEE7B5A67 for ; Tue, 26 Apr 2011 23:47:03 +0000 (UTC) Date: Tue, 26 Apr 2011 23:47:03 +0000 (UTC) From: "Aaron T. Myers (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <1408418006.4411.1303861623324.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <578270654.68465.1303254126138.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-1846) Don't fill preallocated portion of edits log with 0x00 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-1846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025517#comment-13025517 ] Aaron T. Myers commented on HDFS-1846: -------------------------------------- @Eli - the first test was indeed done on an SSD. Here are the results of running the test on a spinning HDD: {noformat} ---------------------------------------------------- Results for classic scheme: Overall total ops: 100000 Overall total time of all ops: 1024072.0 Overall average time of op: 10.24072 Overall fastest op: 3 Overall slowest op: 178 Preallocation total ops: 23 Preallocation total time of all ops: 871.0 Preallocation average time of op: 37.869565217391305 Preallocation fastest op: 28 Preallocation slowest op: 52 Total time of slowest 1% of ops: 48949.0 Average time of slowest 1% of ops: 48.949 ---------------------------------------------------- ---------------------------------------------------- Results for new scheme: Overall total ops: 100000 Overall total time of all ops: 860702.0 Overall average time of op: 8.60702 Overall fastest op: 2 Overall slowest op: 288 Preallocation total ops: 23 Preallocation total time of all ops: 1236.0 Preallocation average time of op: 53.73913043478261 Preallocation fastest op: 41 Preallocation slowest op: 91 Total time of slowest 1% of ops: 36456.0 Average time of slowest 1% of ops: 36.456 ---------------------------------------------------- {noformat} The results are similar to my previous test, just a whole lot slower across the board. If anything the percent improvement for the average op seems to have improved - from 5% improvement on an SSD to 18% improvement on a normal HDD. The average performance degradation of a preallocation-inducing op has also improved - from 1200% worse to 42% worse. Also worth noting that, per an offline suggestion from Todd, I ran this test slightly differently. I ran each test (classic and new schemes) twice, to account for any warm-up time for the various caches involved (disk, JIT, classloading, local FS, etc.) The results I've included here are from the second run of each test. Here's a diff based off my previous patch: {code} index 7e74429..d599224 100644 --- src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestEditLogOutputStream.java +++ src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestEditLogOutputStream.java @@ -19,11 +19,13 @@ public class TestEditLogOutputStream { @Test public void testEditLogOutputStreamPerformanceWithClassicPreallocationScheme() throws IOException { performTestAndPrintResults(false); + performTestAndPrintResults(false); } @Test public void testEditLogOutputStreamPerformanceWithNewPreallocationScheme() throws IOException { performTestAndPrintResults(true); + performTestAndPrintResults(true); } private void performTestAndPrintResults(boolean useNewPreallocationScheme) throws IOException { @@ -32,6 +34,7 @@ public class TestEditLogOutputStream { Configuration conf = new Configuration(); conf.set(DFSConfigKeys.DFS_PERMISSIONS_ENABLED_KEY, "false"); + conf.set("hadoop.tmp.dir", "/data/1/atm/edits-log-preallocate-test/tmp"); FileSystem.setDefaultUri(conf, "hdfs://localhost:0"); conf.set("dfs.http.address", "127.0.0.1:0"); File baseDir = new File(conf.get("hadoop.tmp.dir"), "dfs/"); {code} > Don't fill preallocated portion of edits log with 0x00 > ------------------------------------------------------ > > Key: HDFS-1846 > URL: https://issues.apache.org/jira/browse/HDFS-1846 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Affects Versions: 0.23.0 > Reporter: Aaron T. Myers > Assignee: Aaron T. Myers > Attachments: hdfs-1846-perf-analysis.0.patch, hdfs-1846.0.txt > > > HADOOP-2330 added a feature to preallocate space in the local file system for the NN transaction log. That change seeks past the current end of the file and writes out some data, which on most systems results in the intervening data in the file being filled with zeros. Most underlying file systems have special handling for sparse files, and don't actually allocate blocks on disk for blocks of a file which consist completely of 0x00. > I've seen cases in the wild where the volume an edits dir is on fills up, resulting in a partial final transaction being written out to disk. If you examine the bytes of this (now corrupt) edits file, you'll see the partial final transaction followed by a lot of zeros, suggesting that the preallocation previously succeeded before the volume ran out of space. If we fill the preallocated space with something other than zeros, we'd likely see the failure at preallocation time, rather than transaction-writing time, and so cause the NN to crash earlier, without a partial transaction being written out. > I also hypothesize that filling the preallocated space in the edits log with something other than 0x00 will result in a performance improvement in NN throughput. I haven't tested this yet, but I intend to as part of this JIRA. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira