hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Dropping a very large table - 75million rows
Date Thu, 09 Feb 2017 15:40:46 GMT
bq. The locality of regions for OTHER tables on the same regionserver also
fell drastically

Can you be a bit more specific on how you came to the above conclusion ?
Dropping one table shouldn't affect locality of other tables - unless
number of regions on each server becomes unbalanced which triggers balancer
activities.

Thanks

On Thu, Feb 9, 2017 at 7:34 AM, Ganesh Viswanathan <gansvv@gmail.com> wrote:

> So here is what I observed.
> Dropping this large table had an immediate effect on average locality for
> the entire cluster. The locality of regions for OTHER tables on the same
> regionserver also fell drastically in the cluster. This was unexpected (I
> only thought locality of regions for the dropped table would be impacted).
> Is this because of compaction? Does the locality computation use the size
> of other regions on each regionserver?
>
> The large drop in locality, however, did not cause latency issues on read
> writes for the other tables. Why is that? Is it because I did not try to
> hit all low locality regions?
>
> (On another note, I was able to test and perform deletions on per region
> basis, but that requires hbck -repair and it seemed more invasive on the
> entire cluster health.)
>
> Thanks,
> Ganesh
>
>
> On Sat, Feb 4, 2017 at 11:20 AM Josh Elser <elserj@apache.org> wrote:
>
> > Ganesh,
> >
> > Just drop the table. You are worried about nothing.
> >
> > On Feb 3, 2017 16:51, "Ganesh Viswanathan" <gansvv@gmail.com> wrote:
> >
> > > Hello Josh-
> > >
> > > I am trying to delete the entire table and recover the disk space. I do
> > not
> > > need to pick specific contents of the table (if thats what you are
> asking
> > > with #2).
> > > My question is would disabling and dropping such a large table affect
> > data
> > > locality in a bad way, or slow down the cluster when major_compaction
> (or
> > > whatever cleans up the tombstoned rows) happens. I also read from
> another
> > > post that it can spawn zookeeper transactions and even lock the
> zookeeper
> > > nodes. Is there any concern around zookeeper functionality when
> dropping
> > > large HBase tables.
> > >
> > > Thanks again for taking the time to respond to my questions!
> > >
> > > Ganesh
> > >
> > >
> > >
> > > On Fri, Feb 3, 2017 at 1:12 PM, Josh Elser <elserj@apache.org> wrote:
> > >
> > > > Ganesh -- I was trying to get at maybe there is a terminology issue
> > here.
> > > > If you disable+drop the table, this is an operation on the order of
> > > Regions
> > > > you have. The number of rows/entries is irrelevant. Closing and
> > deleting
> > > a
> > > > region is a relatively fast operation.
> > > >
> > > > Can you please confirm: are you trying to delete the entire table or
> > are
> > > > you trying to delete the *contents* of a table?
> > > >
> > > > If it is the former, I stand by my "you're worried about nothing"
> > comment
> > > > :)
> > > >
> > > >
> > > > Ganesh Viswanathan wrote:
> > > >
> > > >> Thanks Josh.
> > > >>
> > > >> Also, I realized I didnt give the full size of the table. It takes
> in
> > > >> ~75million rows per minute and stores for 15days. So around
> > 1.125billion
> > > >> rows total.
> > > >>
> > > >> On Fri, Feb 3, 2017 at 12:52 PM, Josh Elser<elserj@apache.org>
> > wrote:
> > > >>
> > > >> I think you are worried about nothing, Ganesh.
> > > >>>
> > > >>> If you want to drop (delete) the entire table, just disable and
> drop
> > it
> > > >>> from the shell. This operation is not going to have a significant
> > > impact
> > > >>> on
> > > >>> your cluster (save a few flush'es). This would only happen if
you
> > have
> > > >>> had
> > > >>> recent writes to this table (which seems unlikely if you want
to
> drop
> > > >>> it).
> > > >>>
> > > >>>
> > > >>> Ganesh Viswanathan wrote:
> > > >>>
> > > >>> Hello,
> > > >>>>
> > > >>>> I need to drop an old HBase table that is quite large. It
has
> > anywhere
> > > >>>> between 2million and 70million datapoints. I turned off the
count
> > > after
> > > >>>> it
> > > >>>> ran on the HBase shell for half a day. I have 4 other tables
that
> > have
> > > >>>> around 75million rows in total and also take heavy PUT and
GET
> > > traffic.
> > > >>>>
> > > >>>> What is the best practice for disabling and dropping such
a large
> > > table
> > > >>>> in
> > > >>>> HBase so that I have minimal impact on the rest of the cluster?
> > > >>>> 1) I hear there are ways to disable (and drop?) specific regions?
> > > Would
> > > >>>> that work?
> > > >>>> 2) Should I scan and delete a few rows at a time until the
size
> > > becomes
> > > >>>> manageable and then disable/drop the table?
> > > >>>>     If so, what is a good number of rows to delete at a time,
> > should I
> > > >>>> run
> > > >>>> a
> > > >>>> major compaction after these row deletes on specific regions,
and
> > what
> > > >>>> is
> > > >>>> a
> > > >>>> good sized table that can be easily dropped (and has been
> validated)
> > > >>>> without causing issues on the larger cluster.
> > > >>>>
> > > >>>>
> > > >>>> Thanks!
> > > >>>> Ganesh
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message