hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <lhofha...@yahoo.com>
Subject Re: How to config hbase0.94.2 to retain deleted data
Date Tue, 23 Oct 2012 18:35:30 GMT
HBase has time range queries. You can say "give me the data as of time T" or "give me the data
between X and Y". How far back you want to retain your data is specified via TTL and VERSIONS.

But... If you delete the data at T+X (X>0), a query as of time T won't return anything,
even though at T the data was still there.

If you don't use TTL and/or VERSIONS in HBase you won't need this feature.

If you do use these you're doing so because you want get to the older data. And you delete
stuff, chances are you want KEEP_DELETED_CELLS enabled.
So within the boundaries specified by TTL/VERSIONS you can get to the data as of any time.


By your logic nobody should use TTL/VERSIONS, which is nonsense.



________________________________
 From: Michael Segel <michael_segel@hotmail.com>
To: lars hofhansl <lhofhansl@yahoo.com> 
Cc: "user@hbase.apache.org" <user@hbase.apache.org> 
Sent: Tuesday, October 23, 2012 4:41 AM
Subject: Re: How to config hbase0.94.2 to retain deleted data
 
"Deleted cells are still subject to TTL and there will never be more than "maximum number
of versions" deleted cells. A new "raw" scan options returns all deleted rows and the delete
markers. "

This is different from the idea suggested by the OP. Here deleted cells still get deleted.
Just that when the compaction flag comes along, its told to ignore them. 

So if I say a column can have 3 versions (cells) then if I insert another value for that row:column
key, I push that deleted cell down the stack.  Enough times, its gone. 

In theory, this feature would be useful if I wanted an OLTP implementation on top of HBase.
It would allow the transaction to bridge a compaction cycle. However, that's pretty much it.


This feature doesn't translate well beyond this. 

It also begs the following:  How do I handle a long transaction (OLTP)  timeouts, and isolation
levels? 

If you look at this at the row level... definitely not a good idea. Think of fat clogging
an artery.
  
On Oct 23, 2012, at 12:22 AM, lars hofhansl <lhofhansl@yahoo.com> wrote:

> http://hbase.apache.org/book/cf.keep.deleted.html
> 
> Without it you cannot do correct as-of-time queries when it comes to deletes.
> 
> -- Lars
> 
> From: Michael Segel <michael_segel@hotmail.com>
> To: user@hbase.apache.org; lars hofhansl <lhofhansl@yahoo.com> 
> Sent: Monday, October 22, 2012 9:18 PM
> Subject: Re: How to config hbase0.94.2 to retain deleted data
> 
> > 
> > Curious, why do you think this is better than using the keep-deleted-cells feature?
> > (It might well be, just curious)
> 
> Ok... so what exactly does this feature mean? 
> 
> Suppose I have 500 rows within a region. I set this feature to be true. 
> I do a massive delete and there are only 50 rows left standing. 
> 
> So if I do a count of the number of rows in the region, I see only 50, yet if I compact
the table, its still full. 
> 
> Granted I'm talking about rows and not cells, but the idea is the same. IMHO you're asking
for more headaches that you solve. 
> 
> KISS would suggest that moving deleted data in to a different table would yield better
performance in the long run. 
> 
> 
> On Oct 21, 2012, at 7:23 PM, lars hofhansl <lhofhansl@yahoo.com> wrote:
> 
> > That'd work too. Requires the regionservers to make remote updates to other regionservers,
though. And you have to trap each and every change (Put, Delete, Increment, Append, RowMutations,
etc)
> > 
> > 
> > Curious, why do you think this is better than using the keep-deleted-cells feature?
> > (It might well be, just curious)
> > 
> > 
> > -- Lars
> > 
> > 
> > 
> > ----- Original Message -----
> > From: Michael Segel <michael_segel@hotmail.com>
> > To: user@hbase.apache.org
> > Cc: 
> > Sent: Sunday, October 21, 2012 4:34 PM
> > Subject: Re: How to config hbase0.94.2 to retain deleted data
> > 
> > I would suggest that you use your coprocessor to copy the data to a 'backup' table
when you mark them for delete. 
> > Then as major compaction hits, the rows are deleted from the main table, but still
reside undeleted in your delete table. 
> > Call it a history table. 
> > 
> > 
> > On Oct 21, 2012, at 3:53 PM, yun peng <pengyunmomo@gmail.com> wrote:
> > 
> >> Hi, All,
> >> I want to retain all deleted key-value pairs in hbase. I have tried to
> >> config HColumnDescript as follow to make it return deleted.
> >> 
> >>  public void postOpen(ObserverContext<RegionCoprocessorEnvironment> e)
{
> >>    HTableDescriptor htd = e.getEnvironment().getRegion().getTableDesc();
> >>    HColumnDescriptor hcd = htd.getFamily(Bytes.toBytes("cf"));
> >>    hcd.setKeepDeletedCells(true);
> >>    hcd.setBlockCacheEnabled(false);
> >>  }
> >> 
> >> However, it does not work for me, as when I issued a delete and then query
> >> by an older timestamp, the old data does not show up.
> >> 
> >> hbase(main):119:0> put 'usertable', "key1", 'cf:c1', "v1", 99
> >> hbase(main):120:0> put 'usertable', "key1", 'cf:c1', "v2", 101
> >> hbase(main):121:0> delete 'usertable', "key1", 'cf:c1', 100
> >> hbase(main):122:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> >> => 99, VERSIONS => 4}
> >> COLUMN                CELL
> >> 
> >> 0 row(s) in 0.0040 seconds
> >> 
> >> hbase(main):123:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> >> => 100, VERSIONS => 4}
> >> COLUMN                CELL
> >> 
> >> 0 row(s) in 0.0050 seconds
> >> 
> >> hbase(main):124:0> get 'usertable', 'key1', {COLUMN => 'cf:c1', TIMESTAMP
> >> => 101, VERSIONS => 4}
> >> COLUMN                CELL
> >> 
> >> cf:c1                timestamp=101, value=v2
> >> 
> >> 1 row(s) in 0.0050 seconds
> >> 
> >> Note this is a new feature in 0.94.2
> >> (HBASE-4536<https://issues.apache.org/jira/browse/HBASE-4536>),
> >> I did not find too many sample code online, so... any one here has
> >> experience in using HBASE-4536. How should one config
> >> hbase to enable this feature in hbase?
> >> 
> >> Thanks
> >> Yun
> > 
> 
> 
> 
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message