hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Put w/ timestamp -> Deleteall -> Put w/ timestamp fails
Date Wed, 15 Aug 2012 12:50:58 GMT
Yonghu,

You are correct at that. Until a major_compact finishes, inserting
with old timestamps will never show. Inserted old timestamped values
before a major compact but after a delete will all go away.

That is why I had to put in the data into the table _after_ the
major_compact ran, in that shell output I'd sent.

On Wed, Aug 15, 2012 at 5:18 PM, yonghu <yongyong313@gmail.com> wrote:
> Hi Harsh,
>
> I have a question of your description. The deleted tag masks the new
> inserted value with old timestamp, that's why the new inserted data
> can'be seen. But after major compaction, this new value will be seen
> again. So, the question is that how the deletion really executes. In
> my understanding, the deletion will delete all the data values which
> TSs are less equal than the TS of the deleted tag. So, if you insert a
> value with old TS after you insert a deleted tag, it should also be
> deleted at the  compaction time. For example, if I first insert
> (k1,t1), and then delete  (k1,t1) with deleted tag which TS is greater
> than t1, then reinsert (k1,t1) again. So, at the compaction time, two
> (k1,t1) should be deleted.
>
> wish your response!
>
> Yong
>
>
>
> On Wed, Aug 15, 2012 at 7:53 AM, Takahiko Kawasaki <daru.tk@gmail.com> wrote:
>> Dear Harsh,
>>
>> Thank you very much for your detailed explanation. I could understand
>> what had been going on during my put/scan/delete operations. I'll modify
>> my application and test programs taking the timestamp implementation
>> into consideration.
>>
>> Best Regards,
>> Takahiko Kawasaki
>>
>> 2012/8/15 Harsh J <harsh@cloudera.com>
>>
>>> When a Delete occurs, an insert is made with the timestamp being the
>>> current time (to indicate it is the latest version). Hence, when you
>>> insert a value after this with an _older_ timestamp, it is not taken
>>> in as the latest version, and is hence ignored when scanning. This is
>>> why you do not see the data.
>>>
>>> If you instead insert this after a compaction has fully run on this
>>> store file, then your value will indeed get shown after insert, cause
>>> at that moment there wouldn't exist such a row with a latest timestamp
>>> at all.
>>>
>>> hbase(main):060:0> flush 'test-table'
>>> 0 row(s) in 0.1020 seconds
>>>
>>> hbase(main):061:0> major_compact 'test-table'
>>> 0 row(s) in 0.0400 seconds
>>>
>>> hbase(main):062:0> put 'test-table', 'row4', 'test-family', 'value', 10
>>> 0 row(s) in 0.0230 seconds
>>>
>>> hbase(main):063:0> scan 'test-table'
>>> ROW                   COLUMN+CELL
>>>  row4                 column=test-family:, timestamp=10, value=value
>>> 1 row(s) in 0.0060 seconds
>>>
>>> I suppose this is why it is recommended not to mess with the
>>> timestamps manually, and instead just rely on versions.
>>>
>>> On Tue, Aug 14, 2012 at 8:24 PM, Takahiko Kawasaki <daru.tk@gmail.com>
>>> wrote:
>>> > Hello,
>>> >
>>> > I have a problem where 'put' with timestamp does not succeed.
>>> > I did the following at the HBase shell.
>>> >
>>> > (1) Do 'put' with timestamp.
>>> >       # 'scan' shows 1 row.
>>> >
>>> > (2) Delete the row by 'deleteall'.
>>> >       # 'scan' says "0 row(s)".
>>> >
>>> > (3) Do 'put' again by the same command line as (1).
>>> >       # 'scan' says "0 row(s)" ! Why?
>>> >
>>> > (4) Increment the timestamp value by 1 and try 'put' again.
>>> >       # 'scan' still says "0 row(s)"! Why?
>>> >
>>> > The command lines I actually typed are as follows and the attached
>>> > file is the output from the command lines.
>>> >
>>> > scan 'test-table'
>>> > put 'test-table', 'row3', 'test-family', 'value'
>>> > scan 'test-table'
>>> > deleteall 'test-table', 'row3'
>>> > scan 'test-table'
>>> > put 'test-table', 'row3', 'test-family', 'value'
>>> > scan 'test-table'
>>> > deleteall 'test-table', 'row3'
>>> > scan 'test-table'
>>> > put 'test-table', 'row4', 'test-family', 'value', 10
>>> > scan 'test-table'
>>> > deleteall 'test-table', 'row4'
>>> > scan 'test-table'
>>> > put 'test-table', 'row4', 'test-family', 'value', 10
>>> > scan 'test-table'
>>> > put 'test-table', 'row4', 'test-family', 'value', 10
>>> > scan 'test-table'
>>> > quit
>>> >
>>> > Is this behavior the HBase specification?
>>> >
>>> > My cluster is built using CDH4 and the HBase version is 0.92.1-cdh4.0.0.
>>> >
>>> > Could anyone give me any insight, please?
>>> >
>>> > Best Regards,
>>> > Takahiko Kawasaki
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>



-- 
Harsh J

Mime
View raw message