hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kannan Muthukkaruppan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3103) investigate/improve compaction performance
Date Mon, 11 Oct 2010 19:31:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919959#action_12919959

Kannan Muthukkaruppan commented on HBASE-3103:

Karthik pointed out the HFile.getReadTime()/HFile.getWriteTime() I was using wouldn't be reliable
for DFS timer as every 5 seconds those values will get reset. So, ignore the part above which
says DFS times seem low.

We did notice though that 1 CPU was pegged during the entire operation. 70% or so was system
CPU and about 30% user CPU.

Cpu13 : 29.5%us, 70.2%sy,  0.0%ni,  0.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Wondering if the additional nanoTime() calls themselves are hurting. So will try taking the
instrumentation out and doing profiling in sampling mode.

> investigate/improve compaction performance
> ------------------------------------------
>                 Key: HBASE-3103
>                 URL: https://issues.apache.org/jira/browse/HBASE-3103
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
> I was running some tests and am seeing that major compacting about 100M of data seems
to take around 40-50 seconds. 
> My simplified test case is something like:
> * Created about a 100M store file (800M uncompressed).
> * 10k keys with 1k columns each (avg. key size: 30 bytes; avg. value size: 45 bytes)

> * Compression and ROWCOL bloom was turned on.
> The test was to major compact this single store file into a new file.
> Added some nanoTime() calls around these three stages:
> * Scanner.next operations
> * bloom computation logic in: StoreFile:append()
> * StoreFile.Writer.append()
> This is what I saw for these three stages:
> {code}
> 2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store: major Compaction
scanTime (ns)         4338103000
> 2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store: major Compaction
bloom only time (ns) 14433821000
> 2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store: major Compaction
append time (ns)     23191478000
> {code}
> The HFile.getReadTime() and HFile.getWriteTime() themselves seems pretty low (under 1
second levels). These are the times for the parts that interact with the DFS (readBlock()
and finishBlock() mostly).
> Are these numbers roughly in line with what others are seeing normally? 
> Will double check my instrumentations, and try to get more data. Might try to run it
under a profiler. But wanted to put it out there for additional input/ideas on improvement.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message