hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bharath Vissapragada <bhara...@cloudera.com>
Subject Re: What is HBase compaction-queue-size at all?
Date Mon, 02 Dec 2013 15:01:45 GMT
Hi,


On Mon, Dec 2, 2013 at 8:07 PM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> wrote:

>    - Is it the *number of Store* of regionserver need to be major compacted
>    ? or numbers of which is* being* compacted currently ?
>
>
> This is the number that are currently in the pipe. Doesn.'t mean they are
> compacting right now, but they are queued for compaction. and not necessary
> major compaction. Major is only if all the regions need to compact.
>

Are you sure about this? I had a quick look at the code and this value is
sum of sizes of queues largeCompactions and smallCompactions. The code
doesn't keep track of whether they are running/in the queue. So I think it
includes both running compactions and those in queue. Am I missing
something?


> "I was discovering that at some time it got *regionserver
> compaction-queue-size = 4*.(I check it from Ambari). That's theoretically
> impossible since I have only *one Store *to write(sequential key) at any
> time, incurring only one major compaction is more reasonable."
>

Adding to what JMS said, compaction is a per region thing. If your write
test creates multiple regions, there is a possibility that multiple
compactions happen at the same time since they are queued.


>
> Why is this "impossible"? A store file is a dump of HBase memory blocks
> written into the disk. Even if you write to a single region, single table,
> with keys all close-by (even if it's all the same exact key). When the
> block in memory reach a threshold, it's then written into the disk. When
> more than x blocks (3 is the default) are there in disk, compaction is
> launched.
>
>    - Just more confusing is : Isn't multi-thread enabled at earlier version
>    that will  allocate each compaction job to a thread , by this reason why
>    there exists compaction queue waiting for processing ?
>
> Yes, compaction is done on a separate thread, but there is one single
> queue. You don't want to take 100% of you RS resources to do compactions...
>
> Depending if you are doing mostly writes and almost no reads, you might
> want to tweek some parameters. And also, you might want to look into bulk
> loading...
>
> Last, maybe you should review you key and distribution.
>
> And last again ;) What is your table definition? Multiplying the columns
> famillies can also sometime lend to this kind of issues...
>
> JM
>
>
>
>
> 2013/12/2 林煒清 <thesuperching@gmail.com>
>
> > Any one knows what compaction queue size is meant?
> >
> > By doc's definition:
> >
> > *9.2.5.* hbase.regionserver.compactionQueueSize Size of the compaction
> > queue. This is the number of stores in the region that have been targeted
> > for compaction.
> >
> >
> >    - Is it the *number of Store* of regionserver need to be major
> compacted
> >    ? or numbers of which is* being* compacted currently ?
> >
> > I have a job writing data in a hotspot style using sequential key(non
> > distributed) with 1 family so that 1 Store each region.
> >
> > I was discovering that at some time it got *regionserver
> > compaction-queue-size = 4*.(I check it from Ambari). That's theoretically
> > impossible since I have only *one Store *to write(sequential key) at any
> > time, incurring only one major compaction is more reasonable.
> >
> >
> >    - Then I dig into the logs ,found there is no thing about hints of
> >     queue size > 0: Every major compaction just say *"This selection was
> in
> >    queue for 0sec", *I don't really understand what's it to means? is it
> >    saying hbase has nothing in compaction queue?
> >
> > 013-11-26 12:28:00,778 INFO
> > [regionserver60020-smallCompactions-1385440028938] regionserver.HStore:
> > Completed major compaction of 3 file(s) in f1 of myTable.key.md5.... into
> > md5....(size=607.8 M), total size for store is 645.8 M.*This selection
> was
> > in queue for 0sec*, and took 39sec to execute.
> >
> >
> >    - Just more confusing is : Isn't multi-thread enabled at earlier
> version
> >    that will  allocate each compaction job to a thread , by this reason
> why
> >    there exists compaction queue waiting for processing ?
> >
>



-- 
Bharath Vissapragada
<http://www.cloudera.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message