hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Whiting (JIRA)" <j...@apache.org>
Subject [jira] Created: (HBASE-2646) Compaction requests should be prioritized to prevent blocking
Date Tue, 01 Jun 2010 18:43:39 GMT
Compaction requests should be prioritized to prevent blocking
-------------------------------------------------------------

                 Key: HBASE-2646
                 URL: https://issues.apache.org/jira/browse/HBASE-2646
             Project: HBase
          Issue Type: Improvement
          Components: regionserver
    Affects Versions: 0.20.4
         Environment: ubuntu server 10; hbase 0.20.4; 4 machine cluster (each machine is an
8 core xeon with 16 GB of ram and 6TB of storage); ~250 Million rows;
            Reporter: Jeff Whiting


While testing the write capacity of a 4 machine hbase cluster we were getting long and frequent
client pauses as we attempted to load the data.  Looking into the problem we'd get a relatively
large compaction queue and when a region hit the "hbase.hstore.blockingStoreFiles" limit it
would get block the client and the compaction request would get put on the back of the queue
waiting for many other less important compactions.  The client is basically stuck at that
point until a compaction is done.  Prioritizing the compaction requests and allowing the request
that is blocking other actions go first would help solve the problem.

You can see the problem by looking at our log files:

You'll first see an event such as a too many HLog which will put a lot of requests on the
compaction queue.
{noformat}
2010-05-25 10:53:26,570 INFO org.apache.hadoop.hbase.regionserver.HLog: Too many hlogs: logs=33,
maxlogs=32; forcing flush of 22 regions(s): responseCounts,RS_6eZzLtdwhGiTwHy,1274232223324,
responses,RS_0qhkL5rUmPCbx3K-1274213057242,1274513189592, responses,RS_1ANYnTegjzVIsHW-12742177419
21,1274511001873, responses,RS_1HQ4UG5BdOlAyuE-1274216757425,1274726323747, responses,RS_1Y7SbqSTsZrYe7a-1274328697838,1274478031930,
responses,RS_1ZH5TB5OdW4BVLm-1274216239894,1274538267659, responses,RS_3BHc4KyoM3q72Yc-1274290546987,1274502062319,
responses,RS_3ra9BaBMAXFAvbK-127421457
9958,1274381552543, responses,RS_6SDrGNuyyLd3oR6-1274219941155,1274385453586, responses,RS_8AGCEMWbI6mZuoQ-1274306857429,1274319602718,
responses,RS_8C8T9DN47uwTG1S-1274215381765,1274289112817, responses,RS_8J5wmdmKmJXzK6g-1274299593861,1274494738952,
responses,RS_8e5Sz0HeFPAdb6c-1274288
641459,1274495868557, responses,RS_8rjcnmBXPKzI896-1274306981684,1274403047940, responses,RS_9FS3VedcyrF0KX2-1274245971331,1274754745013,
responses,RS_9oZgPtxO31npv3C-1274214027769,1274396489756, responses,RS_a3FdO2jhqWuy37C-1274209228660,1274399508186,
responses,RS_a3LJVxwTj29MHVa-12742
{/noformat}

Then you see the too many log files:
{noformat} 
2010-05-25 10:53:31,364 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction
requested for region responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862/783020138
because: regionserver/192.168.0.81:60020.cacheFlusher
2010-05-25 10:53:32,364 WARN org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region
responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862 has too many store files,
putting it back at the end of the flush queue.
{/noformat}

Which leads to this: 
{noformat}
2010-05-25 10:53:27,061 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates
for 'IPC Server handler 60 on 60020' on region responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862:
memstore size 128.0m is >= than blocking 128.0m size
2010-05-25 10:53:27,061 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates
for 'IPC Server handler 84 on 60020' on region responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862:
memstore size 128.0m is >= than blocking 128.0m size
2010-05-25 10:53:27,065 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates
for 'IPC Server handler 1 on 60020' on region responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862:
memstore size 128.0m is >= than blocking 128.0m size
{/noformat}



Once the compaction / split is done a flush is able to happen which unblocks the IPC allowing
writes to continue.  Unfortunately this process can take upwards of 15+ minutes (the specific
case shown here from our logs took about 4 minutes).


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message