hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Adrien <a...@jeanjean.ch>
Subject Re: HBase behaviour at startup (compression)
Date Fri, 19 Dec 2008 17:15:06 GMT

Hello,

Andrew and St.ack, Thanks for your answers, and excuse me for confusion
between compression and compaction...

I reviewed the concept of major / minor compaction in the wiki and I looked
at both jira cases HBASE-938 / HBASE-1062. 

Since I'm running hbase version 0.18.0 I certainly have the problem of
HBASE-938. If I understand well the problem, it is that at startup, all
opened regions that need compaction make a major compaction since the
timestamp of the latest major is not stored anywhere, so the (in memory)
counter is reset to the startup time, and the next major compaction will
take place (with default config) 1 day later.

I'm not sure what you mean with no-operation compaction. That is when there
is no modification made on a region, since a major compaction has already
been done, there is a single mapfile, so major compaction is to merge this
file with himself (i.e. nothing to do). Correct ? But the INFO line in the
log appears indeed ?

With my version 0.18.0 there is no way to make the difference major / minor
looking at the log, correct ?

Is it possible that, a major compaction 'noop' of a untouched hregion yields
to rewrite the file in hadoop-dfs anyway ? I have this suspicion for the
following reason:
I upped the number of replication of in my hadoop / hbase config file from 2
to 3. Then when I restarted HBase. As usual, a batch of compaction take
place, and when it was terminated I observed (using fsck) that all file has
a new replication factor of 3. Then all file been written since I modify the
replication factor value. If they was not rewritten they should still have
replication factor of 2. But a lot of table had no modification since a long
time. So the mapfile should already been compacted. And then the mapfile not
rewritten

Here can be my problem during major compaction:
I think, (I'm not sure, I have to find better tool to monitor my network)
with my light configuration (see above for details), the problem is that
even if the compaction process is quick, for example a single modification
in a cell yield to a major compaction rewriting the whole file, since my
regionservers run on the same machine than the datanodes, they communicate
directly (fast) when RS ask to store a mapfile to DN.
Then the datanode will place replicas of the blocks on the 2 others
datanodes through the slow 100Mbit/s network. At HBase startup time, if
hadoop asks the network to transfer about 200Gb the bandwidth might be
saturated. The lease expires and the RS shut themself done. That could
explain as well the problem of max Xcievers reached sometime in the
datanodes that we disscussed in a previous post.

If this is the issue, in my sense, it should be the responsibility of hadoop
to not accept new files from their client when the number of Xcievers is
large.

Anyway HBASE-938 will certainly helps.

Configuration:
Hadoop 0.18.1
HBase 0.18.0
4 nodes (1Gb ram): (NS / Master) ; (DN) ; (DN / RS) ; (DN / RS)
100Mbit/s ethernet

Thanks for all. 
Have a nice day.

Jean-Adrien


stack-3 wrote:
> 
> Jean-Adrien wrote:
>> Hello,
>>
>> I have a question regarding the behavior of HBase at startup time.
>> First the region servers load all regions of enabled tables, then a batch
>> task of (minor?) compression is made on some of these regions:
>>
>> 2008-12-17 11:04:46,688 INFO
>> org.apache.hadoop.hbase.regionserver.HRegion:
>> starting compaction on region
>> test-D-0.3,GST13927+129099482919-13927,1229196632010
>> 2008-12-17 11:05:36,196 INFO
>> org.apache.hadoop.hbase.regionserver.HRegion:
>> compaction completed on region
>> test-D-0.3,GST13927+129099482919-13927,1229196632010 in 49sec
>>
>> What are the concerned regions ? All of them ? Only the region that have
>> been modified during the last roll of log ?
>>   
> All regions on open schedule a compaction (Usually compaction if 'minor' 
> unless the 'major' interval has elapsed).
> 
> We added this a while back for the following reason.  Region opens 
> usually are the result of a split.  Splits are done by creating facades 
> on the parent regions mapfiles.  These facades -- or 'References' in 
> hbase-speak --  reference the parent regions' mapfiles;  one facade 
> serves up the top-half of the parent's mapfiles while the other serves 
> the bottom-half.  This mechanism makes it so splits run fast.  Downside 
> is that while these References are present in a region, the region is 
> not splittable to avoid build up of compound, fragile 
> References-to-References.... relationships.  Compactions clean up 
> References by writing the content of the parents top or bottom half into 
> new mapfiles in the daughter regions.  During heavy-duty uploading, 
> splits are fast and furious.  To keep it so regions are splittable as 
> soon as possible, we were scheduling clean-up of References as fast as 
> possible by immediately scheduling a compaction.
> 
> Missing from the above is special handling of startup.  Andrew has 
> started work on this in hbase-1062.
> 
> 
> 
>> In my case it takes several hours to complete, since I have about 500
>> regions for 2 region servers. And if I have well  understood how hadoop
>> works, it yield that the entire hdfs content is rewritten during this
>> phase,
>> since the file are written once. Isn't it ?
>>   
> 
> Sounds like original report on HBASE-938 (though the issue got hijacked 
> to address a different issue).  Do you think a major compaction is being 
> triggered on each startup?
> 
> Was this a clean shutdown Jean-Adrien?
> 
> As to rewriting all data, it shouldn't be.  Before the HBASE-938 fix, 
> we'd rewrite all data if a major compaction but not since its commit.
> 
> TRUNK has improvements in this area including logging what type of 
> compaction is running, whether major or minor.
> 
> 
>> If I disable and re-enable a table, must the compactions re-run ?
>>   
> Since regions are opened on reenable, compaction check will be scheduled 
> but if nothing to do, the compaction will be a noop.
> 
> St.Ack
> 
> 

-- 
View this message in context: http://www.nabble.com/HBase-behaviour-at-startup-%28compression%29-tp21051218p21094861.html
Sent from the HBase User mailing list archive at Nabble.com.


Mime
View raw message