hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geoff Hendrey" <ghend...@decarta.com>
Subject RE: PENDING_CLOSE for too long
Date Mon, 31 Oct 2011 17:47:51 GMT
attached is my original email to the list, which contains code for a tool to repair your "hole"
in .META.



-----Original Message-----
From: Stuart Smith [mailto:stu24mail@yahoo.com] 
Sent: Saturday, October 29, 2011 1:39 PM
To: user@hbase.apache.org
Subject: Re: PENDING_CLOSE for too long

Hello Geoff,

  I usually don't show up here, since I use CDH, and good form means I should stay on CDH-users,
But!
  I've been seeing the same issues for months:

 - PENDING_CLOSE too long, master tries to reassign - I see an continuous stream of these.
 - WrongRegionExceptions due to overlapping regions & holes in the regions.

I just spent all day yesterday cribbing off of St.Ack's check_meta.rb script to write a java
program to fix up overlaps & holes in an offline fashion (hbase down, directly on hdfs),
and will start testing next week (cross my fingers!).

It seems like the pending close messages can be ignored?
And once I test my tool, and confirm I know a little bit about what I'm doing, maybe we could
share notes?

Take care,
  -stu



________________________________
From: Geoff Hendrey <ghendrey@decarta.com>
To: user@hbase.apache.org
Cc: hbase-user@hadoop.apache.org
Sent: Saturday, September 3, 2011 12:11 AM
Subject: RE: PENDING_CLOSE for too long

"Are you having trouble getting to any of your data out in tables?"

depends what you mean. We see corruptions from time to time that prevent
us from getting data, one way or another. Today's corruption was regions
with duplicate start and end rows. We fixed that by deleting the
offending regions from HDFS, and running add_table.rb to restore the
meta. The other common corruption is the holes in ".META." that we
repair with a little tool we wrote. We'd love to learn why we see these
corruptions with such regularity (seemingly much higher than others on
the list).

We will implement timeout you suggest, and see how it goes.

Thanks,
Geoff

-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
Stack
Sent: Friday, September 02, 2011 10:51 PM
To: user@hbase.apache.org
Cc: hbase-user@hadoop.apache.org
Subject: Re: PENDING_CLOSE for too long

Are you having trouble getting to any of your data out in tables?

To get rid of them, try restarting your master.

Before you restart your master, do "HBASE-4126  Make timeoutmonitor
timeout after 30 minutes instead of 3"; i.e. set
"hbase.master.assignment.timeoutmonitor.timeout" to 1800000 in
hbase-site.xml.

St.Ack

On Fri, Sep 2, 2011 at 1:40 PM, Geoff Hendrey <ghendrey@decarta.com>
wrote:
> In the master logs, I am seeing "regions in transition timed out" and
> "region has been PENDING_CLOSE for too long, running forced unasign".
> Both of these log messages occur at INFO level, so I assume they are
> innocuous. Should I be concerned?
>
>
>
> -geoff
>
>

Mime
View raw message