hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: How to delete a table manually
Date Fri, 21 Jan 2011 19:14:27 GMT
Thanks Ted.

On Fri, Jan 21, 2011 at 11:05 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> It seems there is a typo:
> "we'll no interrupt the running compaction" should be "we'll now interrupt
> the running compaction"
> On Fri, Jan 21, 2011 at 10:47 AM, Stack <stack@duboce.net> wrote:
>> On Fri, Jan 21, 2011 at 4:51 AM, Wayne <wav100@gmail.com> wrote:
>> > After several hours I have figured out how to get the Disable command to
>> > work and how to delete manually, but in the process there are 4 problems
>> I
>> > encountered that I think are areas that could be improved (or my
>> > understanding improved).
>> >
>> > 1) The client timeout is used for the disable command which was my
>> problem.
>> > Does this totally make sense? Should a DML minded timeout be used for DDL
>> > statements that we know can take a very long time normally with a large
>> > cluster?
>> >
>> Sorry Wayne.  I meant to respond yesterday to your original query.
>> Enable/Disable has been redone in 0.90.  Now there are added
>> enabling/disabling states that are maintained up in zk and in shell
>> there are commands is_enabled and is_disabled.  We still have the same
>> (DML) timeout (sortof -- see below for more) but at least now if it
>> times out, you are not hosed.  The disable or enable process is still
>> running and you can query its state.  There is also notion of async
>> enable/disable though this latter facility is not exposed in shell,
>> only in the HBaseAdmin API.
>> > 2) If the disable command fails the first time it does not "roll back".
>> The
>> > ONLY way to proceed is to enable and then try to disable again. The first
>> > disable attempt is all that seems to work. Subsequent disable statements
>> > usually work without errors but never seem to "work". The entire table
>> > should be disabled after issuing this command or the entire table should
>> > still be enabled. I was caught in this half disabled or mostly disabled
>> > which was very frustating.
>> >
>> Sorry about that.   Should be better in 0.90.0.
>> Things should run a bit faster in 0.90.0 too because disable used to
>> include an update of .META. per region plus a close of all regions
>> that make up the table.  In 0.90.0 there is no longer the .META.
>> update and close is more prompt now; in the past close would wait on
>> any running compactions to complete before proceeding.  In 0.90.0
>> we'll no interrupt the running compaction so close happens the sooner.
>> There is room for a bunch more improvement. For example, deleting a
>> table, there should be short-circuit that punts on flush of in-memory
>> state and clean-close of open regions.
>> > 3) The biggest issue of all is why certain regions do not report back to
>> the
>> > disable command. What are the various states of a region that could cause
>> > this? Compaction I know is one, what else could cause the disable command
>> to
>> > take too long? Shouldn't a disable force itself through and wait long
>> enough
>> > to be able to disable every region? Again a long wait time or a more
>> > forceful operation would help.
>> >
>> It wasn't that smart in 0.20/0.89.  Its still pretty dumb but better in
>> 0.90.0.
>> Master process runs the enable/disable process in both old and new
>> HBase.  In 0.20/0.89, it was a sync process w/ master waiting on
>> regions to flip to 'offline' after successful close.  The state of
>> disabledness was when all regions in table had 'offline' state.  Any
>> hiccup, a problem closing or a failure to update .META. w/ offline per
>> region would bork the disabling process.  It was super fragile.  We
>> tried to talk it up as so.
>> In 0.90, client queues in master an executor that flips table to
>> disabling in zk and then in parallel sends out unassigns of all table
>> regions.  The executor then hangs around with a more DDL-like timeout
>> of hbase.bulk.assignment.waiton.empty.rit (10minutes by default).
>> Meantime clients can check state of the disable.   After all unassigns
>> complete, the table is flipped to disabled.
>> > 4) Through all of the attempts to disable I saw regions coming and going
>> and
>> > nothing was consistent. The UI showed the table as disabled and listed 1
>> > region in the table (there were 1000s). The node view listed several
>> other
>> > regions but not the same one as the table view. It was a very strange
>> > situation. The UI to browse the tables and regions is great but it would
>> be
>> > even better if it gave a 100% view of regions and their current states. A
>> > summary view of region counts per table based on state or status would be
>> > fantastic.
>> Please file a JIRA.  Sounds like good idea.  We could hoist stuff up
>> out of hbck tool up into UI.
>> > There is a compaction count, but what about in split, read/rite
>> > lock, disabled, etc. What is the precise list of regions states that
>> could
>> > occur and show a summary count per state as well as detailed state for
>> each
>> > specific region in the list. Fundamentally this is the health monitor of
>> the
>> > system and as a dba I really need to know the 100% count of regions and
>> > where they are all at in terms of availability. Are they disabled,
>> blocked
>> > for writes, blocked for reads, in compaction, etc. etc. If there are
>> various
>> > states that cause disabling to be blocked it can be reported here so that
>> I
>> > at least know when a disable command can be executed successfully (and
>> this
>> > should be documented).
>> >
>> Please file a JIRA.  This is great stuff.
>> Sorry for pain caused messing w/ broke enable/disable.  It should be
>> better in 0.90 and easier to fix if bugs.
>> St.Ack
>> > Thanks
>> >
>> > On Thu, Jan 20, 2011 at 9:01 PM, Wayne <wav100@gmail.com> wrote:
>> >
>> >> I need to delete some tables and I am not sure the best way to do it.
>> The
>> >> shell does not work. The disable command says it runs ok but every time
>> I
>> >> run drop or truncate I get an exception that says the table is not
>> >> disabled.  The UI shows it as disabled but truncate/drop still do not
>> work.
>> >> I have even tried to restart the cluster as sometimes that makes the
>> disable
>> >> "stick".
>> >>
>> >> What is the best way to delete a table manually? My assumption is that
>> with
>> >> 10k regions in 3 tables that I need to delete that the shell is not
>> going to
>> >> work. How can I do this without a completely fresh install of
>> everything?
>> >> How can the data/tables be removed manually without too much pain?
>> >>
>> >> Thanks.
>> >>
>> >

View raw message