Return-Path: Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: (qmail 10355 invoked from network); 11 Oct 2010 19:31:58 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 11 Oct 2010 19:31:58 -0000 Received: (qmail 54320 invoked by uid 500); 11 Oct 2010 19:31:58 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 54246 invoked by uid 500); 11 Oct 2010 19:31:57 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 54237 invoked by uid 99); 11 Oct 2010 19:31:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Oct 2010 19:31:57 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Oct 2010 19:31:55 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o9BJVXAG006281 for ; Mon, 11 Oct 2010 19:31:33 GMT Message-ID: <9059883.82011286825493546.JavaMail.jira@thor> Date: Mon, 11 Oct 2010 15:31:33 -0400 (EDT) From: "Kannan Muthukkaruppan (JIRA)" To: issues@hbase.apache.org Subject: [jira] Commented: (HBASE-3103) investigate/improve compaction performance In-Reply-To: <14369954.81421286823573954.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919959#action_12919959 ] Kannan Muthukkaruppan commented on HBASE-3103: ---------------------------------------------- Karthik pointed out the HFile.getReadTime()/HFile.getWriteTime() I was using wouldn't be reliable for DFS timer as every 5 seconds those values will get reset. So, ignore the part above which says DFS times seem low. We did notice though that 1 CPU was pegged during the entire operation. 70% or so was system CPU and about 30% user CPU. Cpu13 : 29.5%us, 70.2%sy, 0.0%ni, 0.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Wondering if the additional nanoTime() calls themselves are hurting. So will try taking the instrumentation out and doing profiling in sampling mode. > investigate/improve compaction performance > ------------------------------------------ > > Key: HBASE-3103 > URL: https://issues.apache.org/jira/browse/HBASE-3103 > Project: HBase > Issue Type: Improvement > Reporter: Kannan Muthukkaruppan > > I was running some tests and am seeing that major compacting about 100M of data seems to take around 40-50 seconds. > My simplified test case is something like: > * Created about a 100M store file (800M uncompressed). > * 10k keys with 1k columns each (avg. key size: 30 bytes; avg. value size: 45 bytes) > * Compression and ROWCOL bloom was turned on. > The test was to major compact this single store file into a new file. > Added some nanoTime() calls around these three stages: > * Scanner.next operations > * bloom computation logic in: StoreFile:append() > * StoreFile.Writer.append() > This is what I saw for these three stages: > {code} > 2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store: major Compaction scanTime (ns) 4338103000 > 2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store: major Compaction bloom only time (ns) 14433821000 > 2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store: major Compaction append time (ns) 23191478000 > {code} > The HFile.getReadTime() and HFile.getWriteTime() themselves seems pretty low (under 1 second levels). These are the times for the parts that interact with the DFS (readBlock() and finishBlock() mostly). > Are these numbers roughly in line with what others are seeing normally? > Will double check my instrumentations, and try to get more data. Might try to run it under a profiler. But wanted to put it out there for additional input/ideas on improvement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.