Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 66FCA11C1E for ; Thu, 11 Sep 2014 20:48:34 +0000 (UTC) Received: (qmail 27421 invoked by uid 500); 11 Sep 2014 20:48:34 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 27364 invoked by uid 500); 11 Sep 2014 20:48:34 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 27352 invoked by uid 99); 11 Sep 2014 20:48:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Sep 2014 20:48:34 +0000 Date: Thu, 11 Sep 2014 20:48:34 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-11927) If java7, use zip crc MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-11927: -------------------------- Attachment: crc32ct.svg So, messing with compactiontool on an hdfs cluster, I see 20% of CPU given over to creating checksums but none verifying them! Whats up? Turns out, in this 'tool' context, HDFS is doing the verification of the checksums. I am running on top of branch-2 HDFS so have the latest native crc improvements. You can see the native calls if you try hard. They are to the right of the second peak in this flame graph. There are not many samples but its showing as 0.9 percent as opposed to a 20% you can see in the first graphs I posted when the flame graphs are taken against a running hbase regionserver. Let me see if I can get hbase to use the native checksum making and verifying if it is available. > If java7, use zip crc > --------------------- > > Key: HBASE-11927 > URL: https://issues.apache.org/jira/browse/HBASE-11927 > Project: HBase > Issue Type: Bug > Reporter: stack > Assignee: stack > Fix For: 0.99.1 > > Attachments: c2021.crc2.svg, c2021.write.2.svg, c2021.zip.svg, crc32ct.svg > > > Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in hbase but that is another issue for now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)