hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: How to config hbase0.94.2 to retain deleted data
Date Tue, 23 Oct 2012 18:40:48 GMT
Lars, 

No, that is not what I am suggesting. 

Perhaps I am missing something. Was the OP interested in cells or in row deletes.?

Two different issues. 

On Oct 23, 2012, at 1:35 PM, lars hofhansl <lhofhansl@yahoo.com> wrote:

> HBase has time range queries. You can say "give me the data as of time T" or "give me
the data between X and Y". How far back you want to retain your data is specified via TTL
and VERSIONS.
> 
> But... If you delete the data at T+X (X>0), a query as of time T won't return anything,
even though at T the data was still there.
> 
> If you don't use TTL and/or VERSIONS in HBase you won't need this feature.
> 
> If you do use these you're doing so because you want get to the older data. And you delete
stuff, chances are you want KEEP_DELETED_CELLS enabled.
> So within the boundaries specified by TTL/VERSIONS you can get to the data as of any
time.
> 
> 
> By your logic nobody should use TTL/VERSIONS, which is nonsense.
> 
> 
> 
> ________________________________
> From: Michael Segel <michael_segel@hotmail.com>
> To: lars hofhansl <lhofhansl@yahoo.com> 
> Cc: "user@hbase.apache.org" <user@hbase.apache.org> 
> Sent: Tuesday, October 23, 2012 4:41 AM
> Subject: Re: How to config hbase0.94.2 to retain deleted data
> 
> "Deleted cells are still subject to TTL and there will never be more than "maximum number
of versions" deleted cells. A new "raw" scan options returns all deleted rows and the delete
markers. "
> 
> This is different from the idea suggested by the OP. Here deleted cells still get deleted.
Just that when the compaction flag comes along, its told to ignore them. 
> 
> So if I say a column can have 3 versions (cells) then if I insert another value for that
row:column key, I push that deleted cell down the stack.  Enough times, its gone. 
> 
> In theory, this feature would be useful if I wanted an OLTP implementation on top of
HBase. It would allow the transaction to bridge a compaction cycle. However, that's pretty
much it. 
> 
> This feature doesn't translate well beyond this. 
> 
> It also begs the following:  How do I handle a long transaction (OLTP)  timeouts, and
isolation levels? 
> 
> If you look at this at the row level... definitely not a good idea. Think of fat clogging
an artery.
>   
> On Oct 23, 2012, at 12:22 AM, lars hofhansl <lhofhansl@yahoo.com> wrote:
> 
>> http://hbase.apache.org/book/cf.keep.deleted.html
>> 
>> Without it you cannot do correct as-of-time queries when it comes to deletes.
>> 
>> -- Lars
>> 
>> From: Michael Segel <michael_segel@hotmail.com>
>> To: user@hbase.apache.org; lars hofhansl <lhofhansl@yahoo.com> 
>> Sent: Monday, October 22, 2012 9:18 PM
>> Subject: Re: How to config hbase0.94.2 to retain deleted data
>> 
>>> 
>>> Curious, why do you think this is better than using the keep-deleted-cells feature?
>>> (It might well be, just curious)
>> 
>> Ok... so what exactly does this feature mean? 
>> 
>> Suppose I have 500 rows within a region. I set this feature to be true. 
>> I do a massive delete and there are only 50 rows left standing. 
>> 
>> So if I do a count of the number of rows in the region, I see only 50, yet if I compact
the table, its still full. 
>> 
>> Granted I'm talking about rows and not cells, but the idea is the same. IMHO you're
asking for more headaches that you solve. 
>> 
>> KISS would suggest that moving deleted data in to a different table would yield better
performance in the long run. 
>> 
>> 
>> On Oct 21, 2012, at 7:23 PM, lars hofhansl <lhofhansl@yahoo.com> wrote:
>> 
>>> That'd work too. Requires the regionservers to make remote updates to other regionservers,
though. And you have to trap each and every change (Put, Delete, Increment, Append, RowMutations,
etc)
>>> 
>>> 
>>> Curious, why do you think this is better than using the keep-deleted-cells feature?
>>> (It might well be, just curious)
>>> 
>>> 
>>> -- Lars
>>> 
>>> 
>>> 
>>> ----- Original Message -----
>>> From: Michael Segel <michael_segel@hotmail.com>
>>> To: user@hbase.apache.org
>>> Cc: 
>>> Sent: Sunday, October 21, 2012 4:34 PM
>>> Subject: Re: How to config hbase0.94.2 to retain deleted data
>>> 
>>> I would suggest that you use your coprocessor to copy the data to a 'backup'
table when you mark them for delete. 
>>> Then as major compaction hits, the rows are deleted from the main table, but
still reside undeleted in your delete table. 
>>> Call it a history table. 
>>> 
>>> 
>>> On Oct 21, 2012, at 3:53 PM, yun peng <pengyunmomo@gmail.com> wrote:
>>> 
>>>> Hi, All,
>>>> I want to retain all deleted key-value pairs in hbase. I have tried to
>>>> config HColumnDescript as follow to make it return deleted.
>>>> 
>>>>   public void postOpen(ObserverContext<RegionCoprocessorEnvironment>
e) {
>>>>     HTableDescriptor htd = e.getEnvironment().getRegion().getTableDesc();
>>>>     HColumnDescriptor hcd = htd.getFamily(Bytes.toBytes("cf"));
>>>>     hcd.setKeepDeletedCells(true);
>>>>     hcd.setBlockCacheEnabled(false);
>>>>   }
>>>> 
>>>> However, it does not work for me, as when I issued a delete and then query
>>>> by an older timestamp, the old data does not show up.
>>>> 
>>>> hbase(main):119:0> put 'usertable', "key1", 'cf:c1', "v1", 99
>>>> hbase(main):120:0> put 'usertable', "key1", 'cf:c1', "v2", 101
>>>> hbase(main):121:0> delete 'usertable', "key1", 'cf:c1', 100
>>>> hbase(main):122:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>>>> => 99, VERSIONS => 4}
>>>> COLUMN                CELL
>>>> 
>>>> 0 row(s) in 0.0040 seconds
>>>> 
>>>> hbase(main):123:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>>>> => 100, VERSIONS => 4}
>>>> COLUMN                CELL
>>>> 
>>>> 0 row(s) in 0.0050 seconds
>>>> 
>>>> hbase(main):124:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
>>>> => 101, VERSIONS => 4}
>>>> COLUMN                CELL
>>>> 
>>>> cf:c1                timestamp=101, value=v2
>>>> 
>>>> 1 row(s) in 0.0050 seconds
>>>> 
>>>> Note this is a new feature in 0.94.2
>>>> (HBASE-4536<https://issues.apache.org/jira/browse/HBASE-4536>),
>>>> I did not find too many sample code online, so... any one here has
>>>> experience in using HBASE-4536. How should one config
>>>> hbase to enable this feature in hbase?
>>>> 
>>>> Thanks
>>>> Yun
>>> 
>> 
>> 


Mime
View raw message