Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B775B11E2F for ; Fri, 20 Jun 2014 17:53:25 +0000 (UTC) Received: (qmail 42203 invoked by uid 500); 20 Jun 2014 17:53:25 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 42159 invoked by uid 500); 20 Jun 2014 17:53:25 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 42148 invoked by uid 99); 20 Jun 2014 17:53:25 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Jun 2014 17:53:25 +0000 Date: Fri, 20 Jun 2014 17:53:25 +0000 (UTC) From: "James Thomas (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-6560) Byte array native checksumming on DN side MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Thomas updated HDFS-6560: ------------------------------- Attachment: HDFS-3528.patch Ran some basic performance tests on a 10^8 byte data array. All listed times are for a single call to verifyChunkedSums. Average over 20 runs. Direct buffer with existing native implementation for direct buffers: -Time for CRC32: 56.5 ms -Time for CRC32C: 7.3 ms Direct buffer with Java implementation: -Time for CRC32: 81.8 ms -Time for CRC32C: 82.5 ms Byte array with native implementation developed in this patch: -Time for CRC32: 55.0 ms -Time for CRC32C: 7.63 ms Byte array with Java implementation: -Time for CRC32: 74.4 ms -Time for CRC32C: 74.7 ms So it seems like the native byte array implementation is essentially as fast as the direct buffer equivalent. Next, I ran a test on a single-node cluster (DN had 10 spinning disks) where I wrote a 1 GB file (128 MB block size, all other cluster defaults in place). Averages over 20 runs: Without change: 128.3 MB/s With change: 128.4 MB/s The difference here is not significant. This matches up with Trevor Robinson's results from HDFS-3529 (he refactored write-side code to use direct buffers so that the direct buffer-based native implementation could be used). He saw a significant performance improvement in a setup with SSD drives, so I assume I would see a similar improvement here as well. Once there is some discussion on HDFS-6561, I can try to implement client-side native checksumming and see if that changes things. > Byte array native checksumming on DN side > ----------------------------------------- > > Key: HDFS-6560 > URL: https://issues.apache.org/jira/browse/HDFS-6560 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, hdfs-client, performance > Reporter: James Thomas > Assignee: James Thomas > Attachments: HDFS-3528.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)