hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: HBase Outage - Drop table operation stuck in "DELETE_TABLE_PRE_OPERATION"
Date Fri, 08 Dec 2017 23:19:33 GMT
>From the line number of ProcedureSyncWait.java, it seems you are using
1.2.x release.

Can you check master log prior to 2017-12-08 18:59 ?
Pastebin relevant master log snippet (after necessary redaction).

Once we see the master log, we can see what might cause the
DeleteTableProcedure
to be stuck.

bq. Why rebalancing or other rest of operations are stuck?

If there is region in transition, the balancer wouldn't run.

"hbck –repair" combines many fixes. Normally admin is supposed to analyze
the particular inconsistencies before issuing proper fix.

Cheers

On Fri, Dec 8, 2017 at 1:50 PM, Murthy boddu <snmurthyb@gmail.com> wrote:

> Hi,
>
>
>
> We recently ran into a production issue, here is the summary of events that
> we went through, in timeline order:
>
>
>
>    1. One of the region servers went down (it became inaccessible)
>    2. Region transition initiated, some regions of multiple tables were
>    stuck in transition status. Most of them are in status “OPEN_FAIILED” or
>    “OPENING” or “PENDING”, “CLOSE_FAILED”
>    3. Client requests to those tables are still being diverted to lost
>    server causing failures/time outs. (Which can we do about it ?)
>    4. After waiting for many hours, we ran hbck –repair per table which
>    resolved issues with some of them.
>    5. One table, whose data can get stale in hours, we planned to recreate
>    it to avoid any corruption. Disabling of table went through fine but
>    dropping the table stuck at state “DELETE_TABLE_PRE_OPERATION”, it is
>    waiting for regions in transition to finish. The regions it is
> complaining
>    is in “OPENING” status.
>
> Here is the exception:
>
>
>
> 2017-12-08 18:59:17,975 WARN  [ProcedureExecutor-10]
> procedure.DeleteTableProcedure: Retriable error trying to delete
> table=Queue-SCKAD state=DELETE_TABLE_PRE_OPERATION
>
> org.apache.hadoop.hbase.exceptions.TimeoutIOException: Timed out while
> waiting on regions
> Queue-SCKAD,B19,1502479054304.15a44cf47634d7d2264eaf00d61f6036. in
> transition
>
>                 at
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(
> ProcedureSyncWait.java:123)
>
>                 at
> org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(
> ProcedureSyncWait.java:103)
>
>
>
>    1. This operation has been running for more than 24 hours and doesn’t
>    time out (isn't there a 2 hour timeout for client operations at HBase
> level
>    ? ). Enabling the table back also queues up with no progress.
>    2. Because the table is in disable status, running hbck isn’t helping as
>    it says regions = 0.
>    3. We added new node to the cluster to replace the old one, we see that
>    HBase balancer doesn’t kick in at all. So, basically, region movement is
>    totally stuck.
>    4. No missing data on HDFS, 100% consistent. Hbck detail report on whole
>    cluster also returns OK.
>
>
>
> I can provide additional logs if you request, but can you suggest how we
> can resolve this problem with the cluster? Does restarting hbase master
> process would help? We can’t afford another outage on the cluster making
> the situation tricky.
>
>
>
> My questions:
>
>
>
>    1. Why drop operation need to wait for regions in transition to finish?
>    Is there a way we can abort the on-going region movement or even the
> drop
>    operation?
>    2. Why rebalancing or other rest of operations are stuck?
>    3.  Can you please suggest what action can be taken to resolve this?
>
>
>
> Thank you for your time and help.
>
>
>
> Regards
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message