hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: Dropping a very large table - 75million rows
Date Fri, 03 Feb 2017 21:12:43 GMT
Ganesh -- I was trying to get at maybe there is a terminology issue 
here. If you disable+drop the table, this is an operation on the order 
of Regions you have. The number of rows/entries is irrelevant. Closing 
and deleting a region is a relatively fast operation.

Can you please confirm: are you trying to delete the entire table or are 
you trying to delete the *contents* of a table?

If it is the former, I stand by my "you're worried about nothing" comment :)

Ganesh Viswanathan wrote:
> Thanks Josh.
> Also, I realized I didnt give the full size of the table. It takes in
> ~75million rows per minute and stores for 15days. So around 1.125billion
> rows total.
> On Fri, Feb 3, 2017 at 12:52 PM, Josh Elser<elserj@apache.org>  wrote:
>> I think you are worried about nothing, Ganesh.
>> If you want to drop (delete) the entire table, just disable and drop it
>> from the shell. This operation is not going to have a significant impact on
>> your cluster (save a few flush'es). This would only happen if you have had
>> recent writes to this table (which seems unlikely if you want to drop it).
>> Ganesh Viswanathan wrote:
>>> Hello,
>>> I need to drop an old HBase table that is quite large. It has anywhere
>>> between 2million and 70million datapoints. I turned off the count after it
>>> ran on the HBase shell for half a day. I have 4 other tables that have
>>> around 75million rows in total and also take heavy PUT and GET traffic.
>>> What is the best practice for disabling and dropping such a large table in
>>> HBase so that I have minimal impact on the rest of the cluster?
>>> 1) I hear there are ways to disable (and drop?) specific regions? Would
>>> that work?
>>> 2) Should I scan and delete a few rows at a time until the size becomes
>>> manageable and then disable/drop the table?
>>>     If so, what is a good number of rows to delete at a time, should I run
>>> a
>>> major compaction after these row deletes on specific regions, and what is
>>> a
>>> good sized table that can be easily dropped (and has been validated)
>>> without causing issues on the larger cluster.
>>> Thanks!
>>> Ganesh

View raw message