hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Whiting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2646) Compaction requests should be prioritized to prevent blocking
Date Tue, 01 Jun 2010 21:43:37 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874255#action_12874255
] 

Jeff Whiting commented on HBASE-2646:
-------------------------------------

Your plan sounds good.  There is a lot going on with the master rewrite and a lot of variables
up in the air so it seems good to push this out to 0.21.  Although I plan to to keep using
in my environment ;-)

I agree that in a situation where there isn't enough resources we should just focus on high
priority items and not worry about starvation of low priority items.  I also agree that the
priorityElevationTime would be a tricky parameter to mess with and could make the priority
queue worthless.  Besides if a low priority item somehow becomes critical it will naturally
be re-requested with a higher priority.  As far as the LOW priority I'm don't really know
when that would be used.  I included it to provide a more complete implementation thinking
that someone else might have a good reason to use it.  

Here is a random thought on when a low priority compaction would make sense.  Lets say a the
region just barely went above "hbase.hstore.compactionThreshold" (t) but was still a long
ways off from "hbase.hstore.blockingStoreFiles" (b) so it would request a compaction but of
low priority.  But once it got halfway between compactionThreshold and blockingStoreFiles,
(b+t) / 2, it would then re-request the compaction but at a normal priority.  I'm not really
sure how beneficial this would be but it the only thing coming to mind as to why you'd use
a low priority.

> Compaction requests should be prioritized to prevent blocking
> -------------------------------------------------------------
>
>                 Key: HBASE-2646
>                 URL: https://issues.apache.org/jira/browse/HBASE-2646
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.20.4
>         Environment: ubuntu server 10; hbase 0.20.4; 4 machine cluster (each machine
is an 8 core xeon with 16 GB of ram and 6TB of storage); ~250 Million rows;
>            Reporter: Jeff Whiting
>            Priority: Critical
>             Fix For: 0.21.0
>
>         Attachments: prioritycompactionqueue-0.20.4.patch
>
>
> While testing the write capacity of a 4 machine hbase cluster we were getting long and
frequent client pauses as we attempted to load the data.  Looking into the problem we'd get
a relatively large compaction queue and when a region hit the "hbase.hstore.blockingStoreFiles"
limit it would get block the client and the compaction request would get put on the back of
the queue waiting for many other less important compactions.  The client is basically stuck
at that point until a compaction is done.  Prioritizing the compaction requests and allowing
the request that is blocking other actions go first would help solve the problem.
> You can see the problem by looking at our log files:
> You'll first see an event such as a too many HLog which will put a lot of requests on
the compaction queue.
> {noformat}
> 2010-05-25 10:53:26,570 INFO org.apache.hadoop.hbase.regionserver.HLog: Too many hlogs:
logs=33, maxlogs=32; forcing flush of 22 regions(s): responseCounts,RS_6eZzLtdwhGiTwHy,1274232223324,
responses,RS_0qhkL5rUmPCbx3K-1274213057242,1274513189592, responses,RS_1ANYnTegjzVIsHW-12742177419
> 21,1274511001873, responses,RS_1HQ4UG5BdOlAyuE-1274216757425,1274726323747, responses,RS_1Y7SbqSTsZrYe7a-1274328697838,1274478031930,
responses,RS_1ZH5TB5OdW4BVLm-1274216239894,1274538267659, responses,RS_3BHc4KyoM3q72Yc-1274290546987,1274502062319,
responses,RS_3ra9BaBMAXFAvbK-127421457
> 9958,1274381552543, responses,RS_6SDrGNuyyLd3oR6-1274219941155,1274385453586, responses,RS_8AGCEMWbI6mZuoQ-1274306857429,1274319602718,
responses,RS_8C8T9DN47uwTG1S-1274215381765,1274289112817, responses,RS_8J5wmdmKmJXzK6g-1274299593861,1274494738952,
responses,RS_8e5Sz0HeFPAdb6c-1274288
> 641459,1274495868557, responses,RS_8rjcnmBXPKzI896-1274306981684,1274403047940, responses,RS_9FS3VedcyrF0KX2-1274245971331,1274754745013,
responses,RS_9oZgPtxO31npv3C-1274214027769,1274396489756, responses,RS_a3FdO2jhqWuy37C-1274209228660,1274399508186,
responses,RS_a3LJVxwTj29MHVa-12742
> {noformat}
> Then you see the too many log files:
> {noformat}
> 2010-05-25 10:53:31,364 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread:
Compaction requested for region responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862/783020138
because: regionserver/192.168.0.81:60020.cacheFlusher
> 2010-05-25 10:53:32,364 WARN org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region
responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862 has too many store files,
putting it back at the end of the flush queue.
> {noformat}
> Which leads to this: 
> {noformat}
> 2010-05-25 10:53:27,061 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates
for 'IPC Server handler 60 on 60020' on region responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862:
memstore size 128.0m is >= than blocking 128.0m size
> 2010-05-25 10:53:27,061 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates
for 'IPC Server handler 84 on 60020' on region responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862:
memstore size 128.0m is >= than blocking 128.0m size
> 2010-05-25 10:53:27,065 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates
for 'IPC Server handler 1 on 60020' on region responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862:
memstore size 128.0m is >= than blocking 128.0m size
> {noformat}
> Once the compaction / split is done a flush is able to happen which unblocks the IPC
allowing writes to continue.  Unfortunately this process can take upwards of 15+ minutes (the
specific case shown here from our logs took about 4 minutes).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message