hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <lhofha...@yahoo.com>
Subject Re: Still Seeing Old Data After a Delete
Date Tue, 27 Mar 2012 17:19:28 GMT
Hey Shawn,

how exactly did you delete the column?
There are three types of delete markers: family, column, version.
Your observation would be consistent with having used a version delete marker, which just
marks are a specific version (the latest by default) for delete.

Check out the HBase Reference Guide: http://hbase.apache.org/book.html#version.delete

Also, if you don't mind the plug see a more detailed discussion here: http://hadoop-hbase.blogspot.com/2011/12/deletion-in-hbase.html

-- Lars

----- Original Message -----
From: Shawn Quinn <squinn@moxiegroup.com>
To: user@hbase.apache.org
Sent: Tuesday, March 27, 2012 10:01 AM
Subject: Still Seeing Old Data After a Delete


In a couple of situations we were noticing some odd problems with old data
appearing in the application, and I finally found a reproducible scenario.
Here's what we're seeing in one basic case:

1. Using a scan in hbase shell one of our column cells (both the column
name and value are simple long's) looks like so:

column=thing:\x00\x00\x00\x00\x00\x00\x00\x02, timestamp=1332795701976,

2. If we then use a "Put" to update that cell to a new value it looks as
we'd expect like so:

column=thing:\x00\x00\x00\x00\x00\x00\x00\x02, timestamp=1332866682295,

3. If we then use a "Delete" to remove that column, instead of the column
no longer being included in the scan we instead see the following again:

column=thing:\x00\x00\x00\x00\x00\x00\x00\x02, timestamp=1332795701976,

So, for some reason, at least in this case, the tombstone/delete marker
doesn't appear to be preventing new scans from no longer seeing the old

Note that this is a small development cluster of HBase (version:
hbase-0.90.4-cdh3u2) which contains one master and three region servers,
and I have confirmed that the clocks are synchronized properly between the
four machines.  Also note that we're using the Java client API to run the
Put/Delete commands noted above.

Any ideas on how old data could still appear in a Get/Scan like this, and
if there are any workarounds we could try?  I saw HBASE-4536, but after
reading that thread it didn't seem pertinent to this more basic scenario.

Thanks in advance for any pointers!


View raw message