hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hsieh (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-8473) add note to ref guide about snapshots and ec2 reverse dns requirements.
Date Wed, 01 May 2013 01:17:14 GMT
Jonathan Hsieh created HBASE-8473:

             Summary: add note to ref guide about snapshots and ec2 reverse dns requirements.
                 Key: HBASE-8473
                 URL: https://issues.apache.org/jira/browse/HBASE-8473
             Project: HBase
          Issue Type: Bug
          Components: documentation, snapshots
    Affects Versions: 0.95.0,, 0.98.0
            Reporter: Jonathan Hsieh
            Assignee: Jonathan Hsieh

>From IRC from mighty Jeremy Carroll.

17:10 <jeremy_carroll> jmhsieh: I think I found the root cuase. All my region servers
reach the barrier, but it does not continue.
17:11 <jeremy_carroll> jmhsieh: All RS have this in their logs: 2013-05-01 00:04:56,356
DEBUG org.apache.hadoop.hbase.procedure.Subprocedure: Subprocedure 'backup1' coordinator notified
of 'acquire', waiting on 'reached' or 'abort' from coordinator.
17:11 <jeremy_carroll> jmhsieh: Then the coordinator (Master) never sends anything.
They just sit until the timeout.
17:12 <jeremy_carroll> jmhsieh: So basically 'reached' is never obtained. Then abort
it set, and it fails.
17:24 <jeremy_carroll> jmhsieh: Found the bug. The hostnames dont match the master due
to DNS resolution
17:25 <jeremy_carroll> jmhsieh: The barrier aquired is putting in the local hostname
from the regionservers. In EC2 (Where reverse DNS does not work well), the master hands the
internal name to the client.
17:25 <jeremy_carroll> jmhsieh: https://s3.amazonaws.com/uploads.hipchat.com/23947/185789/au94meik0h3y5ii/Screen%20Shot%202013-04-30%20at%2017.25.50.png

17:26 <jeremy_carroll> jmhsieh: So it's waiting for something like 'ip-10-155-208-202.ec2.internal,60020,1367366580066'
zNode to show up, but instead 'hbasemetaclustera-d1b0a484,60020,1367366580066,' is being inserted.
Barrier is not reached
17:27 <jeremy_carroll> jmhsieh: Reason being in our environment the master does not
have a reverse DNS entry. So we get stuff like this on RegionServer startup in our logs.
17:27 <jeremy_carroll> jmhsieh: 2013-05-01 00:03:00,614 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:
Master passed us hostname to use. Was=hbasemetaclustera-d1b0a484, Now=ip-10-155-208-202.ec2.internal
17:54 <jeremy_carroll> jmhsieh: That was it. Verified. Now that Reverse DNS is working,
snapshots are working. Now how to figure out how to get Reverse DNS working on Route53. I
wished there was something like 'slave.host.name' inside of Hadoop for this. Looking at source

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message