hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From WangYQ <wangyongqiang0...@163.com>
Subject improvement on compaction
Date Fri, 13 Nov 2015 08:51:33 GMT
in hbase0.98.10, DefaultCompactPolicy sort HFiles using seq_id as the main factor.the new file
created after compaction will get ist seq_id from hregion,if we have some HFiles, seq_ids
are as follows:f1  4f2   6f3   8f4    9f5   12

if we compact file f2,f3,f4, get f6_new, we will get seq_id larger than f5, say 14, for example
f1  4
f5   12
f6_new    14

when we do compact, we will delete HFiles whose maxTimeStamp is expire,
but in the example above, HFiles with small timestamp are compacted with files with large
timestamp, just because they have similar  seq_id
so will decrease the chance of delete whole old HFiles

so, i think we can modify the way new HFile create from compaction get seq_id, just get the
max seq_id from the files compacted
in the above example, the seq_id of file f6_new will be max(6,8,9) = 9
in this way, files with similar  timestamp will also have similar  seq_id, will increase the
chance of deleting whole HFiles, reduce the pressure of compaction

so, do you think this will works
and, are there any problem if i set seq_id like this

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message