hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: Dropping a very large table - 75million rows
Date Sat, 04 Feb 2017 19:20:42 GMT
Ganesh,

Just drop the table. You are worried about nothing.

On Feb 3, 2017 16:51, "Ganesh Viswanathan" <gansvv@gmail.com> wrote:

> Hello Josh-
>
> I am trying to delete the entire table and recover the disk space. I do not
> need to pick specific contents of the table (if thats what you are asking
> with #2).
> My question is would disabling and dropping such a large table affect data
> locality in a bad way, or slow down the cluster when major_compaction (or
> whatever cleans up the tombstoned rows) happens. I also read from another
> post that it can spawn zookeeper transactions and even lock the zookeeper
> nodes. Is there any concern around zookeeper functionality when dropping
> large HBase tables.
>
> Thanks again for taking the time to respond to my questions!
>
> Ganesh
>
>
>
> On Fri, Feb 3, 2017 at 1:12 PM, Josh Elser <elserj@apache.org> wrote:
>
> > Ganesh -- I was trying to get at maybe there is a terminology issue here.
> > If you disable+drop the table, this is an operation on the order of
> Regions
> > you have. The number of rows/entries is irrelevant. Closing and deleting
> a
> > region is a relatively fast operation.
> >
> > Can you please confirm: are you trying to delete the entire table or are
> > you trying to delete the *contents* of a table?
> >
> > If it is the former, I stand by my "you're worried about nothing" comment
> > :)
> >
> >
> > Ganesh Viswanathan wrote:
> >
> >> Thanks Josh.
> >>
> >> Also, I realized I didnt give the full size of the table. It takes in
> >> ~75million rows per minute and stores for 15days. So around 1.125billion
> >> rows total.
> >>
> >> On Fri, Feb 3, 2017 at 12:52 PM, Josh Elser<elserj@apache.org>  wrote:
> >>
> >> I think you are worried about nothing, Ganesh.
> >>>
> >>> If you want to drop (delete) the entire table, just disable and drop it
> >>> from the shell. This operation is not going to have a significant
> impact
> >>> on
> >>> your cluster (save a few flush'es). This would only happen if you have
> >>> had
> >>> recent writes to this table (which seems unlikely if you want to drop
> >>> it).
> >>>
> >>>
> >>> Ganesh Viswanathan wrote:
> >>>
> >>> Hello,
> >>>>
> >>>> I need to drop an old HBase table that is quite large. It has anywhere
> >>>> between 2million and 70million datapoints. I turned off the count
> after
> >>>> it
> >>>> ran on the HBase shell for half a day. I have 4 other tables that have
> >>>> around 75million rows in total and also take heavy PUT and GET
> traffic.
> >>>>
> >>>> What is the best practice for disabling and dropping such a large
> table
> >>>> in
> >>>> HBase so that I have minimal impact on the rest of the cluster?
> >>>> 1) I hear there are ways to disable (and drop?) specific regions?
> Would
> >>>> that work?
> >>>> 2) Should I scan and delete a few rows at a time until the size
> becomes
> >>>> manageable and then disable/drop the table?
> >>>>     If so, what is a good number of rows to delete at a time, should
I
> >>>> run
> >>>> a
> >>>> major compaction after these row deletes on specific regions, and what
> >>>> is
> >>>> a
> >>>> good sized table that can be easily dropped (and has been validated)
> >>>> without causing issues on the larger cluster.
> >>>>
> >>>>
> >>>> Thanks!
> >>>> Ganesh
> >>>>
> >>>>
> >>>>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message