hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kyle Oba <kyle...@gmail.com>
Subject Re: Can't Put Row with Same ID Twice (if using custom = timestamp)
Date Tue, 02 Feb 2010 20:17:32 GMT
Ryan,

Thanks for clearing that up!  This is something that's showing up in our =
=3D
test suite.  Since we are repeatedly creating and deleting a row.  So, =3D=

it's not a problem for me to run a compaction, or use a different =3D
row-id.

So, it appears I also have the option of making sure my custom =3D
timestamps are in "the future."

Thanks for clearing that up.  Much appreciated.  And at 249am?!?!  =
Thanks.

Kyle
(@mudphone)

> From: Ryan Rawson <ryanobjc@gmail.com>
> Date: February 2, 2010 2:49:53 AM PST
> To: hbase-user@hadoop.apache.org
> Subject: Re: Can't Put Row with Same ID Twice (if using custom =
timestamp)
> =20
> =20
> This is expected...
> =20
> To understand why, we need to look at how deletes are handled in
> HBase.  Since files in HDFS are immutable, we don't actually go
> through and remove data when you ask for a 'delete'.  Instead we
> insert a delete marker, at a given timestamp that says 'everything
> older than this time is gone'.  This delete marker (also known as
> tombstones in other systems) is an explicit entry and does not go away
> for a while (until the next major compaction).  During reads, we use
> the delete markers to suppress 'deleted data'.
> =20
> When you insert a row with a timestamp that overlaps with a delete
> marker like this, the effect is as you see.
> =20
> One way to "fix" this is:
> put
> delete
> major_compact 'table'
> put
> =20
> during a major compaction, we prune all delete records and their
> suppressed data leaving a nice and clean file with no deleted data nor
> markers.  But normally major compaction is run at most 1x a day, since
> on a larger cluster is is very heavy-weight - it must rewrite the
> entire region of data!
> =20
> Good luck!
> -ryan
> =20
> On Tue, Feb 2, 2010 at 2:29 AM, Kyle Oba <kyleoba@gmail.com> wrote:
>> Hi,
>> =20
>> I seem to be able to write the a row, delete it, then write it again, =
if I use custom version timestamps.
>> =20
>> As you can see from the HBase shell session below, I am:
>> =20
>> 1) creating a row with id =3D "r1" and custom version timestamp
>> 2) deleteall from table
>> 3) attempt to put another row with id =3D "r1", also with custom =
version timestamp
>> 4) successfully able to create another row, with different row id =3D =
"r2"
>> =20
>> I should note that if I do NOT specify a custom timestamp, this =
problem does not seem to show up.
>> =20
>> Perhaps I'm misusing the version timestamp api?
>> =20
>> Kyle
>> =20
>> =20
>> 1)
>> hbase(main):028:0> put "capjure_test", "r1", "meta", "v1", 123
>> 0 row(s) in 0.0030 seconds
>> hbase(main):029:0> scan "capjure_test"
>> ROW                          COLUMN+CELL
>> r1                          column=3Dmeta:, timestamp=3D123, =
value=3Dv1
>> 1 row(s) in 0.0060 seconds
>> =20
>> =20
>> 2)
>> hbase(main):030:0> deleteall "capjure_test", "r1"
>> 0 row(s) in 0.0020 seconds
>> =20
>> =20
>> 3)
>> hbase(main):031:0> put "capjure_test", "r1", "meta", "v1", 124
>> 0 row(s) in 0.0050 seconds
>> hbase(main):032:0> scan "capjure_test"
>> ROW                          COLUMN+CELL
>> 0 row(s) in 0.0030 seconds
>> hbase(main):033:0> flush "capjure_test"
>> 0 row(s) in 0.0900 seconds
>> hbase(main):034:0> scan "capjure_test"
>> ROW                          COLUMN+CELL
>> 0 row(s) in 0.0070 seconds
>> =20
>> =20
>> 4)
>> hbase(main):037:0> put "capjure_test", "r2", "meta", "v1", 124
>> 0 row(s) in 0.0030 seconds
>> hbase(main):038:0> scan "capjure_test"
>> ROW                          COLUMN+CELL
>> r2                          column=3Dmeta:, timestamp=3D124, =
value=3Dv1
>> 1 row(s) in 0.0070 seconds
>> =20



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message