Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 80338 invoked from network); 11 Nov 2009 07:47:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Nov 2009 07:47:02 -0000 Received: (qmail 58700 invoked by uid 500); 11 Nov 2009 07:47:02 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 58648 invoked by uid 500); 11 Nov 2009 07:47:02 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 58638 invoked by uid 99); 11 Nov 2009 07:47:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Nov 2009 07:47:02 +0000 X-ASF-Spam-Status: No, hits=-10.5 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Nov 2009 07:46:59 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id A0D6A234C1EF for ; Tue, 10 Nov 2009 23:46:39 -0800 (PST) Message-ID: <1642480138.1257925599636.JavaMail.jira@brutus> Date: Wed, 11 Nov 2009 07:46:39 +0000 (UTC) From: "Todd Lipcon (JIRA)" To: hdfs-issues@hadoop.apache.org Subject: [jira] Updated: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream In-Reply-To: <1538393185.1257555212582.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-755: ----------------------------- Attachment: hdfs-755.txt Here's a fairly small patch which uses the support for reading multiple checksum chunks from HADOOP-3205. I haven't run the full test suite yet, but got about halfway through and it seems to work - I'll be sure to put it through full testing before it gets committed. I'll also run this on a cluster and get TestDFSIO throughput numbers. Performance results look to be in line with what we see in HADOOP-3205. Benchmark setup: - I put a 700MB file on a psuedodistributed HDFS cluster. - I did 30 "fs -cat" of this file without the patch applied, and 30 with it applied. In both cases I did a couple cats first to make sure it was in the buffer cache. I can run another set of benchmarks that drops cache in between runs if people would like. - In both benchmark cases, the patch from HADOOP-3205 was applied. I used a 64K io.file.buffer.size for both the DN and the client. T-test results (alternative hypothesis = "with patch is faster") Wall clock time: p-value = 2.644e-07 -> 100% confidence. 95% confidence interval of 3.4% speedup User time: p-value = 1.638e-10 -> 100% confidence. 95% confidence interval of 3.9% speedup Sys time: p-value = 0.982 - that is to say above 95% confidence that we *slowed down* sys time. The confidence interval is about 0.7% The 95% confidence intervals in this benchmark are less impressive sounding than the ones in HADOOP-3205 because I used fewer samples. As to why the sys time slowed down, it's a bit of a mystery. My best guess is that, since we're now reading from the network sockets in larger chunks, we occasionally block in the kernel where we used to pretty much always read from a full buffer. But, this isn't too concerning - the wall clock time is what really matters. > Read multiple checksum chunks at once in DFSInputStream > ------------------------------------------------------- > > Key: HDFS-755 > URL: https://issues.apache.org/jira/browse/HDFS-755 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client > Affects Versions: 0.22.0 > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Attachments: hdfs-755.txt > > > HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple checksum chunks in a single call to readChunk. This is the HDFS-side use of that new feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.