Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: local policy includes SPF record at
 spf.trusted-forwarder.org)
MIME-Version: 1.0
In-Reply-To: 
 <CAFWiQHaFJD3K4PfGT-goxrmbrXiAahjqAU3iiVq6S+6qSkMm+A@mail.gmail.com>
References: 
 <CAJ_21WX+Mm4vm-Em93-132SGBe=e3nQ+GWEuOmDp2P_2qkchpg@mail.gmail.com>
 <CAPQV63U1Y=My_Wo7pqK7CtTsPrN8A0fOzvkv3KBx8yN-sFzZZA@mail.gmail.com>
 <CAFWiQHaFJD3K4PfGT-goxrmbrXiAahjqAU3iiVq6S+6qSkMm+A@mail.gmail.com>
From: Jean-Marc Spaggiari <jean-marc@spaggiari.org>
Date: Sat, 7 Dec 2013 08:24:06 -0500
Message-ID: 
 <CAPQV63Un8+WsiQQZfy9ss+6C8Oe27CaOL4-kj2fgN+KDihN1qQ@mail.gmail.com>
Subject: Re: What is HBase compaction-queue-size at all?
To: user <user@hbase.apache.org>
Content-Type: multipart/alternative; boundary=20cf307ca07691c45604ecf1aec0

--20cf307ca07691c45604ecf1aec0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

> So I think it includes both running compactions and those in queue. Am I
missing  something?
Yes, that's correct. A major is just a compaction running on all the
regions. So a region server will count it like a compaction. But it can
also be a minor that the RS is seeing. So not necessary a major, but can be=
.


2013/12/2 Bharath Vissapragada <bharathv@cloudera.com>

> Hi,
>
>
> On Mon, Dec 2, 2013 at 8:07 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org
> > wrote:
>
> >    - Is it the *number of Store* of regionserver need to be major
> compacted
> >    ? or numbers of which is* being* compacted currently ?
> >
> >
> > This is the number that are currently in the pipe. Doesn.'t mean they a=
re
> > compacting right now, but they are queued for compaction. and not
> necessary
> > major compaction. Major is only if all the regions need to compact.
> >
>
> Are you sure about this? I had a quick look at the code and this value is
> sum of sizes of queues largeCompactions and smallCompactions. The code
> doesn't keep track of whether they are running/in the queue. So I think i=
t
> includes both running compactions and those in queue. Am I missing
> something?
>
>
> > "I was discovering that at some time it got *regionserver
> > compaction-queue-size =3D 4*.(I check it from Ambari). That's theoretic=
ally
> > impossible since I have only *one Store *to write(sequential key) at an=
y
> > time, incurring only one major compaction is more reasonable."
> >
>
> Adding to what JMS said, compaction is a per region thing. If your write
> test creates multiple regions, there is a possibility that multiple
> compactions happen at the same time since they are queued.
>
>
> >
> > Why is this "impossible"? A store file is a dump of HBase memory blocks
> > written into the disk. Even if you write to a single region, single
> table,
> > with keys all close-by (even if it's all the same exact key). When the
> > block in memory reach a threshold, it's then written into the disk. Whe=
n
> > more than x blocks (3 is the default) are there in disk, compaction is
> > launched.
> >
> >    - Just more confusing is : Isn't multi-thread enabled at earlier
> version
> >    that will  allocate each compaction job to a thread , by this reason
> why
> >    there exists compaction queue waiting for processing ?
> >
> > Yes, compaction is done on a separate thread, but there is one single
> > queue. You don't want to take 100% of you RS resources to do
> compactions...
> >
> > Depending if you are doing mostly writes and almost no reads, you might
> > want to tweek some parameters. And also, you might want to look into bu=
lk
> > loading...
> >
> > Last, maybe you should review you key and distribution.
> >
> > And last again ;) What is your table definition? Multiplying the column=
s
> > famillies can also sometime lend to this kind of issues...
> >
> > JM
> >
> >
> >
> >
> > 2013/12/2 =E6=9E=97=E7=85=92=E6=B8=85 <thesuperching@gmail.com>
> >
> > > Any one knows what compaction queue size is meant?
> > >
> > > By doc's definition:
> > >
> > > *9.2.5.* hbase.regionserver.compactionQueueSize Size of the compactio=
n
> > > queue. This is the number of stores in the region that have been
> targeted
> > > for compaction.
> > >
> > >
> > >    - Is it the *number of Store* of regionserver need to be major
> > compacted
> > >    ? or numbers of which is* being* compacted currently ?
> > >
> > > I have a job writing data in a hotspot style using sequential key(non
> > > distributed) with 1 family so that 1 Store each region.
> > >
> > > I was discovering that at some time it got *regionserver
> > > compaction-queue-size =3D 4*.(I check it from Ambari). That's
> theoretically
> > > impossible since I have only *one Store *to write(sequential key) at
> any
> > > time, incurring only one major compaction is more reasonable.
> > >
> > >
> > >    - Then I dig into the logs ,found there is no thing about hints of
> > >     queue size > 0: Every major compaction just say *"This selection
> was
> > in
> > >    queue for 0sec", *I don't really understand what's it to means? is
> it
> > >    saying hbase has nothing in compaction queue?
> > >
> > > 013-11-26 12:28:00,778 INFO
> > > [regionserver60020-smallCompactions-1385440028938] regionserver.HStor=
e:
> > > Completed major compaction of 3 file(s) in f1 of myTable.key.md5....
> into
> > > md5....(size=3D607.8 M), total size for store is 645.8 M.*This select=
ion
> > was
> > > in queue for 0sec*, and took 39sec to execute.
> > >
> > >
> > >    - Just more confusing is : Isn't multi-thread enabled at earlier
> > version
> > >    that will  allocate each compaction job to a thread , by this reas=
on
> > why
> > >    there exists compaction queue waiting for processing ?
> > >
> >
>
>
>
> --
> Bharath Vissapragada
> <http://www.cloudera.com>
>

--20cf307ca07691c45604ecf1aec0--