hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8473) add note to ref guide about snapshots and ec2 reverse dns requirements.
Date Thu, 10 Jul 2014 04:28:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14057110#comment-14057110

Hadoop QA commented on HBASE-8473:

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  against trunk revision .
  ATTACHMENT ID: 12654927

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+0 tests included{color}.  The patch appears to be a documentation patch
that doesn't require tests.

    {color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10012//console

This message is automatically generated.

> add note to ref guide about snapshots and ec2 reverse dns requirements.
> -----------------------------------------------------------------------
>                 Key: HBASE-8473
>                 URL: https://issues.apache.org/jira/browse/HBASE-8473
>             Project: HBase
>          Issue Type: Bug
>          Components: documentation, snapshots
>    Affects Versions: 0.98.0,, 0.95.0
>            Reporter: Jonathan Hsieh
>            Assignee: Misty Stanley-Jones
>         Attachments: HBASE-8473.patch
> From IRC from mighty Jeremy Carroll.
> {code}
> 17:10 <jeremy_carroll> jmhsieh: I think I found the root cuase. All my region servers
reach the barrier, but it does not continue.
> 17:11 <jeremy_carroll> jmhsieh: All RS have this in their logs: 2013-05-01 00:04:56,356
DEBUG org.apache.hadoop.hbase.procedure.Subprocedure: Subprocedure 'backup1' coordinator notified
of 'acquire', waiting on 'reached' or 'abort' from coordinator.
> 17:11 <jeremy_carroll> jmhsieh: Then the coordinator (Master) never sends anything.
They just sit until the timeout.
> 17:12 <jeremy_carroll> jmhsieh: So basically 'reached' is never obtained. Then
abort it set, and it fails.
> ...
> 17:24 <jeremy_carroll> jmhsieh: Found the bug. The hostnames dont match the master
due to DNS resolution
> 17:25 <jeremy_carroll> jmhsieh: The barrier aquired is putting in the local hostname
from the regionservers. In EC2 (Where reverse DNS does not work well), the master hands the
internal name to the client.
> 17:25 <jeremy_carroll> jmhsieh: https://s3.amazonaws.com/uploads.hipchat.com/23947/185789/au94meik0h3y5ii/Screen%20Shot%202013-04-30%20at%2017.25.50.png

> 17:26 <jeremy_carroll> jmhsieh: So it's waiting for something like 'ip-10-155-208-202.ec2.internal,60020,1367366580066'
zNode to show up, but instead 'hbasemetaclustera-d1b0a484,60020,1367366580066,' is being inserted.
Barrier is not reached
> 17:27 <jeremy_carroll> jmhsieh: Reason being in our environment the master does
not have a reverse DNS entry. So we get stuff like this on RegionServer startup in our logs.
> 17:27 <jeremy_carroll> jmhsieh: 2013-05-01 00:03:00,614 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:
Master passed us hostname to use. Was=hbasemetaclustera-d1b0a484, Now=ip-10-155-208-202.ec2.internal
> 17:54 <jeremy_carroll> jmhsieh: That was it. Verified. Now that Reverse DNS is
working, snapshots are working. Now how to figure out how to get Reverse DNS working on Route53.
I wished there was something like 'slave.host.name' inside of Hadoop for this. Looking at
source code.
> {code}

This message was sent by Atlassian JIRA

View raw message