hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Graham <billgra...@gmail.com>
Subject Re: delete using server's timestamp
Date Fri, 21 Jan 2011 23:41:55 GMT
Thanks Ryan, that clears it up.


On Fri, Jan 21, 2011 at 3:29 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
> No, the storage model does not work like that.  The storage model
> revolves around the KeyValue, which is roughly:
>
> rowid/family/qualifier/timestamp/data
>
> and we store sequences of these in sorted order in HFiles.
>
> Note, we store the row with every single version of every column/cell.
>
> Therefore there is no such thing as "removing the bytes that represent
> the actual row key", they are part of every cell, and once those cells
> go away, then so does the row key.
>
> I hope this helps,
> -ryan
>
> On Fri, Jan 21, 2011 at 3:26 PM, Bill Graham <billgraham@gmail.com> wrote:
>> I follow the tombstone/compact/delete cycle of the column values, but
>> I'm still unclear of the row key life cycle.
>>
>> Is it that the bytes that represent the actual row key are associated
>> with and removed with each column value? Or are they removed upon
>> compaction when no column values exist for a given row key?
>>
>>
>>
>> On Fri, Jan 21, 2011 at 2:26 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>>> Any of the deletes merely insert a 'tombstone' which doesnt delete the
>>> data immediately but does mark it so queries no longer return it.
>>>
>>> During the compactions we prune these delete values and they disappear
>>> for good.  (Barring other backups of course)
>>>
>>> Because of our variable length storage model, we dont store rows in
>>> particular blocks and rewrite said blocks, so notions of rows
>>> 'existing' or not, don't event apply to HBase as they do to RDBMS
>>> systems.
>>>
>>> -ryan
>>>
>>> On Fri, Jan 21, 2011 at 2:21 PM, Bill Graham <billgraham@gmail.com> wrote:
>>>> If you use some combination of delete requests and leave a row without
>>>> any column data will the row/rowkey still exist? I'm thinking of the
>>>> use case where you want to prune all old data, including row keys,
>>>> from a table.
>>>>
>>>>
>>>> On Fri, Jan 21, 2011 at 2:04 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>>>>> There are 3 kinds of deletes (with a 4th for win):
>>>>>
>>>>> - Delete.deleteFamily(byte [] family, [long])
>>>>> -- This removes all data from the given family before the given
>>>>> timestamp, or if none is given, System.currentTimeMillis()
>>>>> - Delete.deleteColumns(byte[] family, byte[]qualifier, [long])
>>>>> -- This removes all data from the given qualifier, before the given
>>>>> timestamp, or if none is given, System.currentTimeMillis()
>>>>> - Delete.deleteColumn(byte[]family, byte[]qualifier, [long])
>>>>> -- This removes A SINGLE VERSION at the given time, or if none is
>>>>> given, the most recent version is Get'ed and deleted.
>>>>> - Delete()
>>>>> -- Calls deleteFamily() on server side on every family.
>>>>>
>>>>> Stack is talking about the LAST delete form.
>>>>>
>>>>> I think what you want is probably deleteColumns() (plural!), or
>>>>> perhaps deleteFamily().
>>>>>
>>>>> One rarely wants to call deleteColumn(), since it removes just a
>>>>> single version, thus exposing older versions, which MAY be what you
>>>>> want, but I'm guessing probably isn't.
>>>>>
>>>>> Only the last form (deleteColumn (singlar!)) calls Get, the rest do
>>>>> not call Get and are very fast.
>>>>>
>>>>> -ryan
>>>>>
>>>>> On Fri, Jan 21, 2011 at 1:51 PM, Stack <stack@duboce.net> wrote:
>>>>>> On Fri, Jan 21, 2011 at 12:30 PM, Matt Corgan <mcorgan@hotpads.com>
wrote:
>>>>>>> Is there a way to issue a delete using the server's current timestamp?
 I
>>>>>>> see methods using HConstants.LATEST_TIMESTAMP which is extremely
expensive
>>>>>>> since it triggers a Get call.
>>>>>>
>>>>>> Yes.  Deleting latest version involves a Get to figure the most
>>>>>> recents timestamp.  And yes, in src code it says this is 'expensive'.
>>>>>> Seems like it does this lookup anything LATEST_TIMESTAMP is passed
>>>>>> whether column, columns, or family only to ensure the delete goes
in
>>>>>> ahead of whatever is currently in the Store.
>>>>>>
>>>>>> St.Ack
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message