Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4EC4818589 for ; Wed, 8 Jul 2015 20:02:56 +0000 (UTC) Received: (qmail 19472 invoked by uid 500); 8 Jul 2015 20:02:53 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 19408 invoked by uid 500); 8 Jul 2015 20:02:53 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 19372 invoked by uid 99); 8 Jul 2015 20:02:53 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Jul 2015 20:02:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 01F13181A31 for ; Wed, 8 Jul 2015 20:02:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.1 X-Spam-Level: X-Spam-Status: No, score=-0.1 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id F8MmAdsl0pSN for ; Wed, 8 Jul 2015 20:02:41 +0000 (UTC) Received: from mail-oi0-f42.google.com (mail-oi0-f42.google.com [209.85.218.42]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 1B12D43DF4 for ; Wed, 8 Jul 2015 20:02:41 +0000 (UTC) Received: by oiab3 with SMTP id b3so56038116oia.1 for ; Wed, 08 Jul 2015 13:02:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=So94ABAvjAUE/pdlLBccqW7BMBKRs95fXLAPk6kLrOI=; b=vaYCBFhyZFo489+8fR20VCady/XzvpUeV3eLP2LcDnVCyJM/xCKVXBh+WikTcpY/2h D8HWhCSS+w7C2SJb/gKglhnCXx0E8eGiLDoK2edsr8yHiSuqaerNNqTZ+cyfHJohwTmj OzGUGZKyRRbxneQW8JnDhxo3qO324CtWXdySB4OJn6B0L9W8ouHhenIOeGDQ20zeOKUD xu/hU0HqHo9y1HBHIw9MAtPZhEuuliBVj72Xch6ZdWafni6DdJR+ygLdJzX2VBKvk6ja eMjZiSRM+4z0S4gcVELYCwDLDkFoDl569xQMrdSupzeZoeLJjiz3xFxpc3vxLQKcKgPa lE5g== X-Received: by 10.183.1.10 with SMTP id bc10mr7887606obd.2.1436385760588; Wed, 08 Jul 2015 13:02:40 -0700 (PDT) MIME-Version: 1.0 Received: by 10.202.178.136 with HTTP; Wed, 8 Jul 2015 13:02:11 -0700 (PDT) In-Reply-To: References: From: Mikhail Antonov Date: Wed, 8 Jul 2015 13:02:11 -0700 Message-ID: Subject: Re: Automating major compactions To: "user@hbase.apache.org" Content-Type: text/plain; charset=UTF-8 I totally understand the reasoning behind compacting regions with biggest number of store files, but didn't follow why it's best to compact regions which have biggest store files, maybe I'm missing something? I'd maybe compact regions which have the smallest avg storefile size? You may also want to take a look at https://issues.apache.org/jira/browse/HBASE-12859, and compact regions for which MC was last run longer time ago. -Mikhail On Wed, Jul 8, 2015 at 10:30 AM, Dejan Menges wrote: > Hi Behdad, > > Thanks a lot, but this part I do already. My question was more what to use > to most intelligently (what exposed or not exposed metrics) figure out > where major compaction is needed the most. > > Currently, choosing the region which has biggest number of store files + > the biggest amount of store files is doing the job, but wasn't sure if > there's maybe something better so far to choose from. > > Cheers, > Dejan > > On Wed, Jul 8, 2015 at 7:19 PM Behdad Forghani > wrote: > >> To start major compaction for tablename from cli, you need to run: >> echo major_compact tablename | hbase shell >> >> I do this after bulk loading to the table. >> >> FYI, to avoid surprises, I also turn off load balancer and rebalance >> regions manually. >> >> The cli command to turn off balancer is: >> echo balance_switch false | hbase shell >> >> To rebalance regions after a bulk load or other changes, run: >> echo balance | hbase shell >> >> You can run these two command using ssh. I use Ansible to do these. >> Assuming you have defined hbase_master in your hosts file, you can run: >> ansible -i hosts hbase_master -a "echo major_compact tablename | hbase >> shell" >> >> Behdad Forghani >> >> On Wed, Jul 8, 2015 at 8:03 AM, Dejan Menges >> wrote: >> >> > Hi, >> > >> > What's the best way to automate major compactions without enabling it >> > during off peak period? >> > >> > What I was testing is simple script which runs on every node in cluster, >> > checks if there is major compaction already running on that node, if not >> > picks one region for compaction and run compaction on that one region. >> > >> > It's running for some time and it helped us get our data to much better >> > shape, but now I'm not quite sure how to choose anymore which region to >> > compact. So far I was reading for that node rs-status#regionStoreStats >> and >> > first choosing the one with biggest amount of storefiles, and then those >> > with biggest storefile sizes. >> > >> > Is there maybe something more intelligent I could/should do? >> > >> > Thanks a lot! >> > >> -- Thanks, Michael Antonov