Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DF6FC187C4 for ; Wed, 8 Jul 2015 20:48:55 +0000 (UTC) Received: (qmail 48633 invoked by uid 500); 8 Jul 2015 20:48:53 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 48557 invoked by uid 500); 8 Jul 2015 20:48:53 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 48514 invoked by uid 99); 8 Jul 2015 20:48:53 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Jul 2015 20:48:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 12495C070B for ; Wed, 8 Jul 2015 20:48:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.893 X-Spam-Level: ** X-Spam-Status: No, score=2.893 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_MSPIKE_H2=-1.108, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id IcZfVSF9oonB for ; Wed, 8 Jul 2015 20:48:47 +0000 (UTC) Received: from mail-qg0-f50.google.com (mail-qg0-f50.google.com [209.85.192.50]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 0EA7220FD3 for ; Wed, 8 Jul 2015 20:48:46 +0000 (UTC) Received: by qget71 with SMTP id t71so106232098qge.2 for ; Wed, 08 Jul 2015 13:48:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=RsRUjjwWwoZgRRgtFfuYeB65K1L3A429UTxlIj9LGu0=; b=ee7ssYmoKu892IY78lurPAgWJ+hGs7XfO/NFcsMJ2Dk/syL5d5aVpnWeh5o0YBeuea 5ouZILBwqYntvJwmHF3592sA/Zv3IHMJg7szUEsT0QmO/jwaf9kWR/+eYpHU3ZvW2T0z eLoovK951hwXi4Vm56qdEy/bJ20WJibY0Ate2wGkEFOMGwwo9TkSgiOFF5IUqrY3AyEv Z5pzr5VXOBZTPHCvfO/JIgTcJOvom66jjWGfUSndn6zpLeh10AHUdZyqDTZmVpA+CO/k Ig7ObS9yGlzUfZq3+etoGiLko/2m1r/5e7MRtWXruYRK3Y37Au6M+WQbl/NKOJwD9JYq YAqg== X-Gm-Message-State: ALoCoQmPW7FeEUjgXCLd1f20Wk/T+I9LRfmzzfpQgGHuO9kCAhqA1voRt1msi/ttPg9vAoW84jkO X-Received: by 10.140.231.206 with SMTP id b197mr20383007qhc.32.1436388520140; Wed, 08 Jul 2015 13:48:40 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.36.227 with HTTP; Wed, 8 Jul 2015 13:48:20 -0700 (PDT) In-Reply-To: References: From: Jean-Marc Spaggiari Date: Wed, 8 Jul 2015 16:48:20 -0400 Message-ID: Subject: Re: Automating major compactions To: user Content-Type: multipart/alternative; boundary=001a1135d5f680cd75051a634309 --001a1135d5f680cd75051a634309 Content-Type: text/plain; charset=UTF-8 Just missing the ColumnFamiliy at the end of the path. Your memory is pretty good. JM 2015-07-08 16:39 GMT-04:00 Vladimir Rodionov : > You can find this info yourself, Dejan > > 1. Locate table dir on HDFS > 2. List all regions (directories) > 3. Iterate files in each directory and find the oldest one (creation time) > 4. The region with the oldest file is your candidate for major compaction > > /HBASE_ROOT/data/namespace/table/region (If my memory serves me right :)) > > -Vlad > > On Wed, Jul 8, 2015 at 1:07 PM, Dejan Menges > wrote: > > > Hi Mikhail, > > > > Actually, reason is quite stupid on my side - to avoid compacting one > > region over and over again while others are waiting in line (reading HTML > > and sorting only on number of store files gets you at some point having > > bunch of regions having exactly the same number of store files). > > > > Thanks for this hint - this is exactly something I was looking for. Was > > trying previously to figure out if it's possible to query meta for this > > information (using currently 0.98.0, 0.98.4 and waiting for HDP 2.3 from > > Hortonworks to upgrade immediately) but for our current version didn't > > found that possible, that's why I decided going this way. > > > > On Wed, Jul 8, 2015 at 10:02 PM Mikhail Antonov > > wrote: > > > > > I totally understand the reasoning behind compacting regions with > > > biggest number of store files, but didn't follow why it's best to > > > compact regions which have biggest store files, maybe I'm missing > > > something? I'd maybe compact regions which have the smallest avg > > > storefile size? > > > > > > You may also want to take a look at > > > https://issues.apache.org/jira/browse/HBASE-12859, and compact regions > > > for which MC was last run longer time ago. > > > > > > -Mikhail > > > > > > On Wed, Jul 8, 2015 at 10:30 AM, Dejan Menges > > > wrote: > > > > Hi Behdad, > > > > > > > > Thanks a lot, but this part I do already. My question was more what > to > > > use > > > > to most intelligently (what exposed or not exposed metrics) figure > out > > > > where major compaction is needed the most. > > > > > > > > Currently, choosing the region which has biggest number of store > files > > + > > > > the biggest amount of store files is doing the job, but wasn't sure > if > > > > there's maybe something better so far to choose from. > > > > > > > > Cheers, > > > > Dejan > > > > > > > > On Wed, Jul 8, 2015 at 7:19 PM Behdad Forghani < > behdad@exapackets.com> > > > > wrote: > > > > > > > >> To start major compaction for tablename from cli, you need to run: > > > >> echo major_compact tablename | hbase shell > > > >> > > > >> I do this after bulk loading to the table. > > > >> > > > >> FYI, to avoid surprises, I also turn off load balancer and rebalance > > > >> regions manually. > > > >> > > > >> The cli command to turn off balancer is: > > > >> echo balance_switch false | hbase shell > > > >> > > > >> To rebalance regions after a bulk load or other changes, run: > > > >> echo balance | hbase shell > > > >> > > > >> You can run these two command using ssh. I use Ansible to do these. > > > >> Assuming you have defined hbase_master in your hosts file, you can > > run: > > > >> ansible -i hosts hbase_master -a "echo major_compact tablename | > hbase > > > >> shell" > > > >> > > > >> Behdad Forghani > > > >> > > > >> On Wed, Jul 8, 2015 at 8:03 AM, Dejan Menges < > dejan.menges@gmail.com> > > > >> wrote: > > > >> > > > >> > Hi, > > > >> > > > > >> > What's the best way to automate major compactions without enabling > > it > > > >> > during off peak period? > > > >> > > > > >> > What I was testing is simple script which runs on every node in > > > cluster, > > > >> > checks if there is major compaction already running on that node, > if > > > not > > > >> > picks one region for compaction and run compaction on that one > > region. > > > >> > > > > >> > It's running for some time and it helped us get our data to much > > > better > > > >> > shape, but now I'm not quite sure how to choose anymore which > region > > > to > > > >> > compact. So far I was reading for that node > > rs-status#regionStoreStats > > > >> and > > > >> > first choosing the one with biggest amount of storefiles, and then > > > those > > > >> > with biggest storefile sizes. > > > >> > > > > >> > Is there maybe something more intelligent I could/should do? > > > >> > > > > >> > Thanks a lot! > > > >> > > > > >> > > > > > > > > > > > > -- > > > Thanks, > > > Michael Antonov > > > > > > --001a1135d5f680cd75051a634309--