hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8473) add note to ref guide about snapshots and ec2 reverse dns requirements.
Date Mon, 14 Jul 2014 04:57:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060326#comment-14060326

stack commented on HBASE-8473:

Patch is good.  I could apply.  We mention reverse dns as an old requirement at "
DNS".  Should this section link to it?  Might be ok if it didn't?  I can just commit.   Troubleshooting
is a good place for this info at least for starters.  Let me just commit.

> add note to ref guide about snapshots and ec2 reverse dns requirements.
> -----------------------------------------------------------------------
>                 Key: HBASE-8473
>                 URL: https://issues.apache.org/jira/browse/HBASE-8473
>             Project: HBase
>          Issue Type: Bug
>          Components: documentation, snapshots
>    Affects Versions: 0.98.0,, 0.95.0
>            Reporter: Jonathan Hsieh
>            Assignee: Misty Stanley-Jones
>         Attachments: HBASE-8473.patch
> From IRC from mighty Jeremy Carroll.
> {code}
> 17:10 <jeremy_carroll> jmhsieh: I think I found the root cuase. All my region servers
reach the barrier, but it does not continue.
> 17:11 <jeremy_carroll> jmhsieh: All RS have this in their logs: 2013-05-01 00:04:56,356
DEBUG org.apache.hadoop.hbase.procedure.Subprocedure: Subprocedure 'backup1' coordinator notified
of 'acquire', waiting on 'reached' or 'abort' from coordinator.
> 17:11 <jeremy_carroll> jmhsieh: Then the coordinator (Master) never sends anything.
They just sit until the timeout.
> 17:12 <jeremy_carroll> jmhsieh: So basically 'reached' is never obtained. Then
abort it set, and it fails.
> ...
> 17:24 <jeremy_carroll> jmhsieh: Found the bug. The hostnames dont match the master
due to DNS resolution
> 17:25 <jeremy_carroll> jmhsieh: The barrier aquired is putting in the local hostname
from the regionservers. In EC2 (Where reverse DNS does not work well), the master hands the
internal name to the client.
> 17:25 <jeremy_carroll> jmhsieh: https://s3.amazonaws.com/uploads.hipchat.com/23947/185789/au94meik0h3y5ii/Screen%20Shot%202013-04-30%20at%2017.25.50.png

> 17:26 <jeremy_carroll> jmhsieh: So it's waiting for something like 'ip-10-155-208-202.ec2.internal,60020,1367366580066'
zNode to show up, but instead 'hbasemetaclustera-d1b0a484,60020,1367366580066,' is being inserted.
Barrier is not reached
> 17:27 <jeremy_carroll> jmhsieh: Reason being in our environment the master does
not have a reverse DNS entry. So we get stuff like this on RegionServer startup in our logs.
> 17:27 <jeremy_carroll> jmhsieh: 2013-05-01 00:03:00,614 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:
Master passed us hostname to use. Was=hbasemetaclustera-d1b0a484, Now=ip-10-155-208-202.ec2.internal
> 17:54 <jeremy_carroll> jmhsieh: That was it. Verified. Now that Reverse DNS is
working, snapshots are working. Now how to figure out how to get Reverse DNS working on Route53.
I wished there was something like 'slave.host.name' inside of Hadoop for this. Looking at
source code.
> {code}

This message was sent by Atlassian JIRA

View raw message