hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hansi Klose" <hansi.kl...@web.de>
Subject double keys major_compaction
Date Thu, 01 Oct 2015 08:17:56 GMT
Hi,

I have the problem that we have key in our cluster which exist double.
The keys have different timestamps.

I got notice of the keys, because we are replicating the data to another cluster
and in the target cluster we see only the keys with the newer timestamp.

we run major_compaction on regular basis in both cluster.

The table has VERSIONS => '1'

get 't1', "\x98\x04......", {COLUMN => 'd', VERSIONS => 5 }
timestamp=1442848394860, value=@\x83

get 't1', "\x98\x04......", {COLUMN => 'd', VERSIONS => 5, TIMESTAMP => 1442569821452
}
timestamp=1442569821452, value=@\x83

I thought that after a 

flush 't1'
major_compact 't1'

the key with the old timestamp would be deleted because we have versions => 1

http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/
"In a major compaction, deleted key/values are removed, this new file 
doesn’t contain the tombstone markers and all the duplicate key/values
(replace value operations) are removed."

But this does not happen.

After the flush and major_compaction of the table the keys are still there.

We use hbase: 0.94.2-cdh4.2.0, rUnknown

Why the are still there? Do i have to delete them manual?

Regards

Mime
View raw message