Return-Path: Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: (qmail 89072 invoked from network); 1 Jun 2010 22:28:01 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 1 Jun 2010 22:28:01 -0000 Received: (qmail 12955 invoked by uid 500); 1 Jun 2010 22:28:01 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 12920 invoked by uid 500); 1 Jun 2010 22:28:01 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 12912 invoked by uid 99); 1 Jun 2010 22:28:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Jun 2010 22:28:01 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Jun 2010 22:27:58 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o51MRbWF025021 for ; Tue, 1 Jun 2010 22:27:37 GMT Message-ID: <23540502.117421275431256985.JavaMail.jira@thor> Date: Tue, 1 Jun 2010 18:27:36 -0400 (EDT) From: "Jeff Whiting (JIRA)" To: issues@hbase.apache.org Subject: [jira] Commented: (HBASE-2646) Compaction requests should be prioritized to prevent blocking MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874283#action_12874283 ] Jeff Whiting commented on HBASE-2646: ------------------------------------- I agree that for compactions having more than 2 priorities is overkill and is bound to have problems. In the general case having a low priority for a priority queue would make sense. I'm glad to hear about the concurrent compaction and flushes as that would be a huge improvement. Also has there been any talk of multiple compaction threads per region server? That way multiple regions could be compacted in parallel. You'd have to move the compaction queue to a more global location and have all the threads pull from one source. On lower end machines running compactions in parallel may not make sense. But for higher end machines it seems like it could pay dividends. Looking at my cluster (8 core xeons, 16gb ram, jbod 3 x 2TB) we almost always have some resources available to do parallel compactions. > Compaction requests should be prioritized to prevent blocking > ------------------------------------------------------------- > > Key: HBASE-2646 > URL: https://issues.apache.org/jira/browse/HBASE-2646 > Project: HBase > Issue Type: Improvement > Components: regionserver > Affects Versions: 0.20.4 > Environment: ubuntu server 10; hbase 0.20.4; 4 machine cluster (each machine is an 8 core xeon with 16 GB of ram and 6TB of storage); ~250 Million rows; > Reporter: Jeff Whiting > Priority: Critical > Fix For: 0.21.0 > > Attachments: prioritycompactionqueue-0.20.4.patch > > > While testing the write capacity of a 4 machine hbase cluster we were getting long and frequent client pauses as we attempted to load the data. Looking into the problem we'd get a relatively large compaction queue and when a region hit the "hbase.hstore.blockingStoreFiles" limit it would get block the client and the compaction request would get put on the back of the queue waiting for many other less important compactions. The client is basically stuck at that point until a compaction is done. Prioritizing the compaction requests and allowing the request that is blocking other actions go first would help solve the problem. > You can see the problem by looking at our log files: > You'll first see an event such as a too many HLog which will put a lot of requests on the compaction queue. > {noformat} > 2010-05-25 10:53:26,570 INFO org.apache.hadoop.hbase.regionserver.HLog: Too many hlogs: logs=33, maxlogs=32; forcing flush of 22 regions(s): responseCounts,RS_6eZzLtdwhGiTwHy,1274232223324, responses,RS_0qhkL5rUmPCbx3K-1274213057242,1274513189592, responses,RS_1ANYnTegjzVIsHW-12742177419 > 21,1274511001873, responses,RS_1HQ4UG5BdOlAyuE-1274216757425,1274726323747, responses,RS_1Y7SbqSTsZrYe7a-1274328697838,1274478031930, responses,RS_1ZH5TB5OdW4BVLm-1274216239894,1274538267659, responses,RS_3BHc4KyoM3q72Yc-1274290546987,1274502062319, responses,RS_3ra9BaBMAXFAvbK-127421457 > 9958,1274381552543, responses,RS_6SDrGNuyyLd3oR6-1274219941155,1274385453586, responses,RS_8AGCEMWbI6mZuoQ-1274306857429,1274319602718, responses,RS_8C8T9DN47uwTG1S-1274215381765,1274289112817, responses,RS_8J5wmdmKmJXzK6g-1274299593861,1274494738952, responses,RS_8e5Sz0HeFPAdb6c-1274288 > 641459,1274495868557, responses,RS_8rjcnmBXPKzI896-1274306981684,1274403047940, responses,RS_9FS3VedcyrF0KX2-1274245971331,1274754745013, responses,RS_9oZgPtxO31npv3C-1274214027769,1274396489756, responses,RS_a3FdO2jhqWuy37C-1274209228660,1274399508186, responses,RS_a3LJVxwTj29MHVa-12742 > {noformat} > Then you see the too many log files: > {noformat} > 2010-05-25 10:53:31,364 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for region responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862/783020138 because: regionserver/192.168.0.81:60020.cacheFlusher > 2010-05-25 10:53:32,364 WARN org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862 has too many store files, putting it back at the end of the flush queue. > {noformat} > Which leads to this: > {noformat} > 2010-05-25 10:53:27,061 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 60 on 60020' on region responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862: memstore size 128.0m is >= than blocking 128.0m size > 2010-05-25 10:53:27,061 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 84 on 60020' on region responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862: memstore size 128.0m is >= than blocking 128.0m size > 2010-05-25 10:53:27,065 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862: memstore size 128.0m is >= than blocking 128.0m size > {noformat} > Once the compaction / split is done a flush is able to happen which unblocks the IPC allowing writes to continue. Unfortunately this process can take upwards of 15+ minutes (the specific case shown here from our logs took about 4 minutes). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.