hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brennon Church <bren...@getjar.com>
Subject Re: Question about compactions
Date Thu, 21 Mar 2013 20:44:17 GMT

Here's the data locality index values for all 8 nodes:


Those seem pretty bad to me.

I'm running HBase v. 0.92.0

I'd considered the async problem, and was going to add some basic checks 
into the script to not submit additional compactions to the queue if I 
saw that it had anything in it already.

For the moment, it seems my best bet is to run through the major 
compactions for everything to regain locality.  Going forward, we may or 
may not need the major compactions on a regular basis.  I can tell you 
it's been several months since we turned them off, and performance has 
been reasonable.



On 3/21/13 10:49 AM, Jean-Daniel Cryans wrote:
> On Thu, Mar 21, 2013 at 6:46 AM, Brennon Church <brennon@getjar.com> wrote:
>> Hello all,
>> As I understand it, a common performance tweak is to disable major
>> compactions so that you don't end up with storms taking things out at
>> inconvenient times.  I'm thinking that I should just write a quick script to
>> rotate through all of our regions, one at a time, and compact them.  Again,
>> if I'm understanding this correctly we should not end up with storms as
>> they'll only happen one at a time, and each one doesn't run for long.  Does
>> that seem reasonable, or am I missing something?  My hope is to run the
>> script regularly.
> FWIW major compacting isn't even needed if you don't update or delete
> cells so do consider that too.
> The problem with scheduling major compactions yourself is that, since
> the command is async, you can still end up with a storm of compactions
> if you just blindly issue major_compact for all your regions. Things
> like adding wait time works but then let's say you want the
> compactions to run only between 2 and 4AM then you can run out of
> time. What I have seen to circumvent this is to only do a subset of
> the regions at a time. You can also use JMX to monitor the compaction
> queue on each RS and make sure you are not just piling them up, but
> this requires some more work.
>> Corollary question... I recently added drives to our nodes and since I did
>> this while they were all still running, basically just restarting the
>> datanode underneath to pick up the new spindles, I'm fairly sure I've thrown
>> data locality out the window, based on the changed pattern of network
>> traffic.
> Interesting but unlikely. Even restarting HBase shouldn't do that
> unless it was wrongly restarted. Each RS publishes a locality index
> (hdfsBlocksLocalityIndex) that you can find via JMX or in their web
> UI, are they close to 100% or way down? Also which version are you on?
>> If I'm right, manually running major compactions against all of
>> the regions should resolve that, as the underlying data would all get
>> written locally.  Again, does that make sense?
> Major compacting would do that yes, but first check if you need it at
> all I think.
> J-D

View raw message