hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Iulia Zidaru <iulia.zid...@1and1.ro>
Subject Compactions in busy system(s)
Date Tue, 05 Apr 2011 10:45:44 GMT
  Hi all,

I'm not sure if I've understood well the purpose of major compaction and 
how to handle it in a busy system.
It is important to run major compaction when we have a lot of deleted 
data, as it removes the "marked as deleted" flags.
There are also the "flush" and "minor compaction" operations associated 
with the writing on disk. I understand that in minor compaction many 
files resulted from flush operations are written in only one file. What 
is not very clear is whether major compaction does the same operation 
(and so it can be skipped if no deletes are in the system) or there is 
also a particular operation which is not done in minor compaction and 
skipping it may affect the performance or volume.

An other thing that I'd like you to help me clarifying is if major 
compaction on all dataset is the sum of major compaction of all regions. 
If so, it is possible to major compact only some regions at a time, and 
other regions at other time. I also don't understand well if it is 
possible for the system to merge a region with less data with other 
region and if it does, which of the mentioned operations might affect 
the good system behavior(i.e. what NOT to do).

The last point is regarding the files in HDFS (this might affect the 
volume). When is the data deleted from HDFS(in minor and major 
compaction)? Are the files deleted when a compaction is performed or 
they are only marked as deleted?

Thank you,

Iulia Zidaru
Java Developer

1&1 Internet AG - Bucharest/Romania - Web Components Romania
18 Mircea Eliade St
Sect 1, Bucharest
RO Bucharest, 012015
0040 31 223 9153


View raw message