hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Can't Put Row with Same ID Twice (if using custom timestamp)
Date Tue, 02 Feb 2010 10:49:53 GMT
This is expected...

To understand why, we need to look at how deletes are handled in
HBase.  Since files in HDFS are immutable, we don't actually go
through and remove data when you ask for a 'delete'.  Instead we
insert a delete marker, at a given timestamp that says 'everything
older than this time is gone'.  This delete marker (also known as
tombstones in other systems) is an explicit entry and does not go away
for a while (until the next major compaction).  During reads, we use
the delete markers to suppress 'deleted data'.

When you insert a row with a timestamp that overlaps with a delete
marker like this, the effect is as you see.

One way to "fix" this is:
put
delete
major_compact 'table'
put

during a major compaction, we prune all delete records and their
suppressed data leaving a nice and clean file with no deleted data nor
markers.  But normally major compaction is run at most 1x a day, since
on a larger cluster is is very heavy-weight - it must rewrite the
entire region of data!

Good luck!
-ryan

On Tue, Feb 2, 2010 at 2:29 AM, Kyle Oba <kyleoba@gmail.com> wrote:
> Hi,
>
> I seem to be able to write the a row, delete it, then write it again, if I use custom
version timestamps.
>
> As you can see from the HBase shell session below, I am:
>
> 1) creating a row with id = "r1" and custom version timestamp
> 2) deleteall from table
> 3) attempt to put another row with id = "r1", also with custom version timestamp
> 4) successfully able to create another row, with different row id = "r2"
>
> I should note that if I do NOT specify a custom timestamp, this problem does not seem
to show up.
>
> Perhaps I'm misusing the version timestamp api?
>
> Kyle
>
>
> 1)
> hbase(main):028:0> put "capjure_test", "r1", "meta", "v1", 123
> 0 row(s) in 0.0030 seconds
> hbase(main):029:0> scan "capjure_test"
> ROW                          COLUMN+CELL
>  r1                          column=meta:, timestamp=123, value=v1
> 1 row(s) in 0.0060 seconds
>
>
> 2)
> hbase(main):030:0> deleteall "capjure_test", "r1"
> 0 row(s) in 0.0020 seconds
>
>
> 3)
> hbase(main):031:0> put "capjure_test", "r1", "meta", "v1", 124
> 0 row(s) in 0.0050 seconds
> hbase(main):032:0> scan "capjure_test"
> ROW                          COLUMN+CELL
> 0 row(s) in 0.0030 seconds
> hbase(main):033:0> flush "capjure_test"
> 0 row(s) in 0.0900 seconds
> hbase(main):034:0> scan "capjure_test"
> ROW                          COLUMN+CELL
> 0 row(s) in 0.0070 seconds
>
>
> 4)
> hbase(main):037:0> put "capjure_test", "r2", "meta", "v1", 124
> 0 row(s) in 0.0030 seconds
> hbase(main):038:0> scan "capjure_test"
> ROW                          COLUMN+CELL
>  r2                          column=meta:, timestamp=124, value=v1
> 1 row(s) in 0.0070 seconds
>
>

Mime
View raw message