Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: local policy includes SPF record at
 spf.trusted-forwarder.org)
MIME-Version: 1.0
In-Reply-To: 
 <CAJ_21WX+Mm4vm-Em93-132SGBe=e3nQ+GWEuOmDp2P_2qkchpg@mail.gmail.com>
References: 
 <CAJ_21WX+Mm4vm-Em93-132SGBe=e3nQ+GWEuOmDp2P_2qkchpg@mail.gmail.com>
From: Jean-Marc Spaggiari <jean-marc@spaggiari.org>
Date: Mon, 2 Dec 2013 09:37:29 -0500
Message-ID: 
 <CAPQV63U1Y=My_Wo7pqK7CtTsPrN8A0fOzvkv3KBx8yN-sFzZZA@mail.gmail.com>
Subject: Re: What is HBase compaction-queue-size at all?
To: user <user@hbase.apache.org>
Content-Type: multipart/alternative; boundary=001a11337ed4cf6f1604ec8e1fed

--001a11337ed4cf6f1604ec8e1fed
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

   - Is it the *number of Store* of regionserver need to be major compacted
   ? or numbers of which is* being* compacted currently ?


This is the number that are currently in the pipe. Doesn.'t mean they are
compacting right now, but they are queued for compaction. and not necessary
major compaction. Major is only if all the regions need to compact.

"I was discovering that at some time it got *regionserver
compaction-queue-size =3D 4*.(I check it from Ambari). That's theoretically
impossible since I have only *one Store *to write(sequential key) at any
time, incurring only one major compaction is more reasonable."

Why is this "impossible"? A store file is a dump of HBase memory blocks
written into the disk. Even if you write to a single region, single table,
with keys all close-by (even if it's all the same exact key). When the
block in memory reach a threshold, it's then written into the disk. When
more than x blocks (3 is the default) are there in disk, compaction is
launched.

   - Just more confusing is : Isn't multi-thread enabled at earlier version
   that will  allocate each compaction job to a thread , by this reason why
   there exists compaction queue waiting for processing ?

Yes, compaction is done on a separate thread, but there is one single
queue. You don't want to take 100% of you RS resources to do compactions...

Depending if you are doing mostly writes and almost no reads, you might
want to tweek some parameters. And also, you might want to look into bulk
loading...

Last, maybe you should review you key and distribution.

And last again ;) What is your table definition? Multiplying the columns
famillies can also sometime lend to this kind of issues...

JM


2013/12/2 =E6=9E=97=E7=85=92=E6=B8=85 <thesuperching@gmail.com>

> Any one knows what compaction queue size is meant?
>
> By doc's definition:
>
> *9.2.5.* hbase.regionserver.compactionQueueSize Size of the compaction
> queue. This is the number of stores in the region that have been targeted
> for compaction.
>
>
>    - Is it the *number of Store* of regionserver need to be major compact=
ed
>    ? or numbers of which is* being* compacted currently ?
>
> I have a job writing data in a hotspot style using sequential key(non
> distributed) with 1 family so that 1 Store each region.
>
> I was discovering that at some time it got *regionserver
> compaction-queue-size =3D 4*.(I check it from Ambari). That's theoretical=
ly
> impossible since I have only *one Store *to write(sequential key) at any
> time, incurring only one major compaction is more reasonable.
>
>
>    - Then I dig into the logs ,found there is no thing about hints of
>     queue size > 0: Every major compaction just say *"This selection was =
in
>    queue for 0sec", *I don't really understand what's it to means? is it
>    saying hbase has nothing in compaction queue?
>
> 013-11-26 12:28:00,778 INFO
> [regionserver60020-smallCompactions-1385440028938] regionserver.HStore:
> Completed major compaction of 3 file(s) in f1 of myTable.key.md5.... into
> md5....(size=3D607.8 M), total size for store is 645.8 M.*This selection =
was
> in queue for 0sec*, and took 39sec to execute.
>
>
>    - Just more confusing is : Isn't multi-thread enabled at earlier versi=
on
>    that will  allocate each compaction job to a thread , by this reason w=
hy
>    there exists compaction queue waiting for processing ?
>

--001a11337ed4cf6f1604ec8e1fed--