Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0BB1418D24 for ; Wed, 8 Jul 2015 17:19:35 +0000 (UTC) Received: (qmail 1455 invoked by uid 500); 8 Jul 2015 17:19:32 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 1387 invoked by uid 500); 8 Jul 2015 17:19:32 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 1376 invoked by uid 99); 8 Jul 2015 17:19:32 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Jul 2015 17:19:32 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id E7231181A31 for ; Wed, 8 Jul 2015 17:19:31 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.979 X-Spam-Level: ** X-Spam-Status: No, score=2.979 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Iznb-xWRDSNO for ; Wed, 8 Jul 2015 17:19:30 +0000 (UTC) Received: from mail-vn0-f42.google.com (mail-vn0-f42.google.com [209.85.216.42]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 10F7420DA6 for ; Wed, 8 Jul 2015 17:19:29 +0000 (UTC) Received: by vnbf7 with SMTP id f7so25341285vnb.1 for ; Wed, 08 Jul 2015 10:18:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=rtX9VuSF2m4YPv4eMyZp7ogYmZ62QxwpY1HUd8kXA7o=; b=jrai/BrTzvFMtwXtoOFZg2VQoSVfi6bkyyp6FqPUw257ksy5s3W2J1j+B1RhTYg54L 3Z4GvHXxXDPXFxdMyblk8fGts0TIrP01qVHQe0MNsdimwO8MEiLwJAGaEVwjxs0kfcAG IRvuxt0lYlE0ZElodIzaiQelV8ELW++ThWhySnSZgokllww9L9Qv0oH9yl1WH8DlN42n cP8PfPNrIs6u5abFtNMJfRtQ1xUVaP/GaWS7Hpmi5AidKHSctFGQh5LN7SyA4SQKpGCX u+pSy+52eso4h9muTI9FC7H0NI42wqS28ArJfwROLNikeTojMmZ9CGaIbrCijlT85rEn SHYA== X-Gm-Message-State: ALoCoQkBxK7psv/omyAE4zPOPZ5t7uKIctozbnQGs9USb09567HY1IvhMSB7ZoSXmmU6SlU1pU/r MIME-Version: 1.0 X-Received: by 10.52.176.131 with SMTP id ci3mr10990922vdc.72.1436375923688; Wed, 08 Jul 2015 10:18:43 -0700 (PDT) Received: by 10.31.9.193 with HTTP; Wed, 8 Jul 2015 10:18:43 -0700 (PDT) In-Reply-To: References: Date: Wed, 8 Jul 2015 12:18:43 -0500 Message-ID: Subject: Re: Automating major compactions From: Behdad Forghani To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=bcaec5196a6fb21fa2051a6054f8 --bcaec5196a6fb21fa2051a6054f8 Content-Type: text/plain; charset=UTF-8 To start major compaction for tablename from cli, you need to run: echo major_compact tablename | hbase shell I do this after bulk loading to the table. FYI, to avoid surprises, I also turn off load balancer and rebalance regions manually. The cli command to turn off balancer is: echo balance_switch false | hbase shell To rebalance regions after a bulk load or other changes, run: echo balance | hbase shell You can run these two command using ssh. I use Ansible to do these. Assuming you have defined hbase_master in your hosts file, you can run: ansible -i hosts hbase_master -a "echo major_compact tablename | hbase shell" Behdad Forghani On Wed, Jul 8, 2015 at 8:03 AM, Dejan Menges wrote: > Hi, > > What's the best way to automate major compactions without enabling it > during off peak period? > > What I was testing is simple script which runs on every node in cluster, > checks if there is major compaction already running on that node, if not > picks one region for compaction and run compaction on that one region. > > It's running for some time and it helped us get our data to much better > shape, but now I'm not quite sure how to choose anymore which region to > compact. So far I was reading for that node rs-status#regionStoreStats and > first choosing the one with biggest amount of storefiles, and then those > with biggest storefile sizes. > > Is there maybe something more intelligent I could/should do? > > Thanks a lot! > --bcaec5196a6fb21fa2051a6054f8--