Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of bbeaudreault@hubspot.com
 designates 74.125.149.69 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAPQV63UoPg4cohieyi1PhDO+bbj-Mb0JG3g9un9dkH7-Hwt==g@mail.gmail.com>
References: 
 <CAE24rAfnyjmng5CzuKg+_h-pQQL1aLBQFyB7R3V74sZkOLkK6w@mail.gmail.com>
 <CAPQV63UoPg4cohieyi1PhDO+bbj-Mb0JG3g9un9dkH7-Hwt==g@mail.gmail.com>
From: Bryan Beaudreault <bbeaudreault@hubspot.com>
Date: Tue, 9 Jul 2013 10:48:50 -0400
Message-ID: 
 <CANZDn9u_8Z+Rr_Cykc2XJ9zwOTr43aObzcHOykLXxi7nNEgAbA@mail.gmail.com>
Subject: Re: Disabled automated compaction - table still compacting
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=047d7b3432308596f604e11543b6

--047d7b3432308596f604e11543b6
Content-Type: text/plain; charset=ISO-8859-1

You should be able to limit what JM describes by tuning the following two
configs:

hbase.hstore.compactionThreshold
hbase.hstore.compaction.max

Beware of this property as well when tuning the above so you don't
accidentally cause blocking of flushes, though I imagine you would be
tuning down not up and so wouldn't be a
problem: hbase.hstore.blockingStoreFiles


On Tue, Jul 9, 2013 at 10:41 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi David,
>
> Minor compactions can be promoted to Major compactions when all the
> files are selected for compaction. And the property below will not
> avoid that to occur.
>
> Section 9.7.6.5 there: http://hbase.apache.org/book/regions.arch.html
>
> JM
>
>
> 2013/7/9 David Koch <ogdude@googlemail.com>:
> > Hello,
> >
> > We disabled automated major compactions by setting
> > hbase.hregion.majorcompaction=0.
> > This was to avoid issues during buik import of data since compactions
> > seemed to cause the running imports to crash. However, even after
> > disabling, region server logs still show compactions going on, as well as
> > aborted compactions. We also get compaction queue size warnings in
> Cloudera
> > Manager.
> >
> > Why is this the case?
> >
> > To be fair, we only disabled automated compactions AFTER the import
> failed
> > for the first time (yes, HBase was restarted) so maybe there are some
> > trailing compactions, but the queue size keeps increasing which I guess
> > should not be the case. Then again, I don't know how aborted compactions
> > are counted - i.e not sure whether or not to trust the metrics on this.
> >
> > A bit more about what I am trying to accomplish:
> >
> > I am bulk loading about 100 indexed .lzo files with 20 * 10^6 Key-Value
> > (0.5kb) each into an HBase table. Each file is loaded by a separate
> Mapper
> > job, several of these jobs run in parallel to make sure all task trackers
> > are used. Key distribution is the same in each file so even region growth
> > is to be expected. We did not pre-split the table as it does not seem to
> > have been a limiting factor earlier.
> >
> > On a related note. What if any experience do other HBase/Cloudera users
> > have with the Snapshotting feature detailed below?
> >
> > http://www.cloudera.com/content/cloudera-content/cloudera-
> > docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_20_12.html
> >
> > We need of a robust way to do inter-cluster cloning/back-up of tables,
> > preferably without taking the source table offline or impacting
> performance
> > of the source cluster. We only use HDFS files for importing because the
> > CopyTable job needs to run on the source cluster and cannot be resumed
> once
> > it fails.
> >
> > Thanks,
> >
> > /David
>

--047d7b3432308596f604e11543b6--