hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ganesh Viswanathan <gan...@gmail.com>
Subject Re: Dropping a very large table - 75million rows
Date Fri, 03 Feb 2017 21:51:06 GMT
Hello Josh-

I am trying to delete the entire table and recover the disk space. I do not
need to pick specific contents of the table (if thats what you are asking
with #2).
My question is would disabling and dropping such a large table affect data
locality in a bad way, or slow down the cluster when major_compaction (or
whatever cleans up the tombstoned rows) happens. I also read from another
post that it can spawn zookeeper transactions and even lock the zookeeper
nodes. Is there any concern around zookeeper functionality when dropping
large HBase tables.

Thanks again for taking the time to respond to my questions!

Ganesh



On Fri, Feb 3, 2017 at 1:12 PM, Josh Elser <elserj@apache.org> wrote:

> Ganesh -- I was trying to get at maybe there is a terminology issue here.
> If you disable+drop the table, this is an operation on the order of Regions
> you have. The number of rows/entries is irrelevant. Closing and deleting a
> region is a relatively fast operation.
>
> Can you please confirm: are you trying to delete the entire table or are
> you trying to delete the *contents* of a table?
>
> If it is the former, I stand by my "you're worried about nothing" comment
> :)
>
>
> Ganesh Viswanathan wrote:
>
>> Thanks Josh.
>>
>> Also, I realized I didnt give the full size of the table. It takes in
>> ~75million rows per minute and stores for 15days. So around 1.125billion
>> rows total.
>>
>> On Fri, Feb 3, 2017 at 12:52 PM, Josh Elser<elserj@apache.org>  wrote:
>>
>> I think you are worried about nothing, Ganesh.
>>>
>>> If you want to drop (delete) the entire table, just disable and drop it
>>> from the shell. This operation is not going to have a significant impact
>>> on
>>> your cluster (save a few flush'es). This would only happen if you have
>>> had
>>> recent writes to this table (which seems unlikely if you want to drop
>>> it).
>>>
>>>
>>> Ganesh Viswanathan wrote:
>>>
>>> Hello,
>>>>
>>>> I need to drop an old HBase table that is quite large. It has anywhere
>>>> between 2million and 70million datapoints. I turned off the count after
>>>> it
>>>> ran on the HBase shell for half a day. I have 4 other tables that have
>>>> around 75million rows in total and also take heavy PUT and GET traffic.
>>>>
>>>> What is the best practice for disabling and dropping such a large table
>>>> in
>>>> HBase so that I have minimal impact on the rest of the cluster?
>>>> 1) I hear there are ways to disable (and drop?) specific regions? Would
>>>> that work?
>>>> 2) Should I scan and delete a few rows at a time until the size becomes
>>>> manageable and then disable/drop the table?
>>>>     If so, what is a good number of rows to delete at a time, should I
>>>> run
>>>> a
>>>> major compaction after these row deletes on specific regions, and what
>>>> is
>>>> a
>>>> good sized table that can be easily dropped (and has been validated)
>>>> without causing issues on the larger cluster.
>>>>
>>>>
>>>> Thanks!
>>>> Ganesh
>>>>
>>>>
>>>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message