hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kannan Muthukkaruppan (JIRA)" <j...@apache.org>
Subject [jira] Created: (HBASE-3103) investigate/improve compaction performance
Date Mon, 11 Oct 2010 18:59:33 GMT
investigate/improve compaction performance
------------------------------------------

                 Key: HBASE-3103
                 URL: https://issues.apache.org/jira/browse/HBASE-3103
             Project: HBase
          Issue Type: Improvement
            Reporter: Kannan Muthukkaruppan


I was running some tests and am seeing that major compacting about 100M of data seems to take
around 40-50 seconds. 

My simplified test case is something like:

* Created about a 100M store file (800M uncompressed).
* 10k keys with 1k columns each (avg. key size: 30 bytes; avg. value size: 45 bytes) 
* Compression and ROWCOL bloom was turned on.

The test was to major compact this single store file into a new file.

Added some nanoTime() calls around these three stages:

* Scanner.next operations
* bloom computation logic in: StoreFile:append()
* StoreFile.Writer.append()

This is what I saw for these three stages:

{code}
2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store: major Compaction
scanTime (ns)         4338103000
2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store: major Compaction
bloom only time (ns) 14433821000
2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store: major Compaction
append time (ns)     23191478000
{code}

The HFile.getReadTime() and HFile.getWriteTime() themselves seems pretty low (under 1 second
levels). These are the times for the parts that interact with the DFS (readBlock() and finishBlock()
mostly).

Are these numbers roughly in line with what others are seeing normally? 

Will double check my instrumentations, and try to get more data. Might try to run it under
a profiler. But wanted to put it out there for additional input/ideas on improvement.




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message