falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Balu Vellanki (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FALCON-2090) HDFS Snapshot failed with UnknownHostException when scheduling in HA Mode
Date Wed, 20 Jul 2016 06:06:20 GMT

     [ https://issues.apache.org/jira/browse/FALCON-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Balu Vellanki updated FALCON-2090:
----------------------------------
    Fix Version/s: 0.10

> HDFS Snapshot failed with UnknownHostException when scheduling in HA Mode
> -------------------------------------------------------------------------
>
>                 Key: FALCON-2090
>                 URL: https://issues.apache.org/jira/browse/FALCON-2090
>             Project: Falcon
>          Issue Type: Bug
>          Components: replication
>    Affects Versions: trunk
>            Reporter: Murali Ramasami
>            Assignee: Balu Vellanki
>            Priority: Critical
>             Fix For: trunk, 0.10
>
>
> In NN HA, when I schedule a hdfs snapshot  replication, it is failing with "java.net.UnknownHostException:
mycluster1". In the error message primary is the source cluster Nameservice. Please see the
complete stack trace.
> Stack Trace:
> {noformat}
> Log Contents:
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/grid/0/hadoop/yarn/local/filecache/371/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/grid/0/hadoop/yarn/local/filecache/213/mapreduce.tar.gz/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> Error: java.lang.IllegalArgumentException: java.net.UnknownHostException: mycluster1
> 	at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:411)
> 	at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:429)
> 	at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.initialize(WebHdfsFileSystem.java:207)
> 	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2730)
> 	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:98)
> 	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2764)
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2746)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:178)
> 	at org.apache.falcon.hive.util.EventUtils.initializeFS(EventUtils.java:145)
> 	at org.apache.falcon.hive.mapreduce.CopyMapper.setup(CopyMapper.java:47)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> Caused by: java.net.UnknownHostException: mycluster1
> 	... 19 more
> {noformat}
> Steps to Reproduce:
> primaryCluster:
> ============
> {noformat}
> <?xml version="1.0" encoding="UTF-8"?>
> <cluster xmlns="uri:falcon:cluster:0.1" colo="USWestOregon" description="oregonHadoopCluster"
name="primaryCluster">
>    <interfaces>
>       <interface type="readonly" endpoint="webhdfs://mycluster1:20070" version="0.20.2"
/>
>       <interface type="write" endpoint="hdfs://mycluster1:8020" version="0.20.2" />
>       <interface type="execute" endpoint="mramasami-falcon-multi-ha-bug-12.openstacklocal:8050"
version="0.20.2" />
>       <interface type="workflow" endpoint="http://mramasami-falcon-multi-ha-bug-14.openstacklocal:11000/oozie"
version="3.1" />
>       <interface type="messaging" endpoint="tcp://mramasami-falcon-multi-ha-bug-9.openstacklocal:61616?daemon=true"
version="5.1.6" />
>       <interface type="registry" endpoint="thrift://mramasami-falcon-multi-ha-bug-14.openstacklocal:9083"
version="0.11.0" />
>    </interfaces>
>    <locations>
>       <location name="staging" path="/tmp/fs" />
>       <location name="temp" path="/tmp" />
>       <location name="working" path="/tmp/fw" />
>    </locations>
>    <ACL owner="hrt_qa" group="users" permission="0755" />
>    <properties>
>       <property name="dfs.namenode.kerberos.principal" value="nn/_HOST@EXAMPLE.COM"
/>
>       <property name="hive.metastore.kerberos.principal" value="hive/_HOST@EXAMPLE.COM"
/>
>       <property name="hive.metastore.sasl.enabled" value="true" />
>       <property name="hadoop.rpc.protection" value="authentication" />
>       <property name="hive.metastore.uris" value="thrift://mramasami-falcon-multi-ha-bug-14.openstacklocal:9083"
/>
>       <property name="hive.server2.uri" value="hive2://mramasami-falcon-multi-ha-bug-14.openstacklocal:10000"
/>
>    </properties>
> </cluster>
> {noformat}
> falcon entity -submit -type cluster -file primaryCluster.xml --> primaryCluster
> backupCluster :
> ============
> {noformat}
> <?xml version="1.0" encoding="UTF-8"?>
> <cluster xmlns="uri:falcon:cluster:0.1" colo="USWestOregon" description="oregonHadoopCluster"
name="backupCluster">
>    <interfaces>
>       <interface type="readonly" endpoint="webhdfs://mycluster2:20070" version="0.20.2"
/>
>       <interface type="write" endpoint="hdfs://mycluster2:8020" version="0.20.2" />
>       <interface type="execute" endpoint="mramasami-falcon-multi-ha-bug-5.openstacklocal:8050"
version="0.20.2" />
>       <interface type="workflow" endpoint="http://mramasami-falcon-multi-ha-bug-6.openstacklocal:11000/oozie"
version="3.1" />
>       <interface type="messaging" endpoint="tcp://mramasami-falcon-multi-ha-bug-1.openstacklocal:61616"
version="5.1.6" />
>       <interface type="registry" endpoint="thrift://mramasami-falcon-multi-ha-bug-6.openstacklocal:9083"
version="0.11.0" />
>    </interfaces>
>    <locations>
>       <location name="staging" path="/tmp/fs" />
>       <location name="temp" path="/tmp" />
>       <location name="working" path="/tmp/fw" />
>    </locations>
>    <ACL owner="hrt_qa" group="users" permission="0755" />
>    <properties>
>       <property name="dfs.namenode.kerberos.principal" value="nn/_HOST@EXAMPLE.COM"
/>
>       <property name="hive.metastore.kerberos.principal" value="hive/_HOST@EXAMPLE.COM"
/>
>       <property name="hive.metastore.sasl.enabled" value="true" />
>       <property name="hadoop.rpc.protection" value="authentication" />
>       <property name="hive.metastore.uris" value="thrift://mramasami-falcon-multi-ha-bug-6.openstacklocal:9083"
/>
>       <property name="hive.server2.uri" value="hive2://mramasami-falcon-multi-ha-bug-6.openstacklocal:10000"
/>
>    </properties>
> </cluster>
> {noformat}
> falcon entity -submit -type cluster -file backupCluster.xml --> backupCluster
> HDFS Snapshot Replication:
> =========================
> Source:
> ======
> hdfs dfs -mkdir -p /tmp/falcon-regression/HDFSSnapshotTest/source
> hdfs dfs -put /grid/0/hadoopqe/tests/ha/falcon/combinedActions/mr_input/2015/01/02/NYSE-2000-2001.tsv
/tmp/falcon-regression/HDFSSnapshotTest/source
> Create Snapshot :
> ===============
> hdfs dfsadmin -allowSnapshot /tmp/falcon-regression/HDFSSnapshotTest/source [ hdfs]
> hdfs dfs -createSnapshot /tmp/falcon-regression/HDFSSnapshotTest/source [ hrt_qa]
> hdfs lsSnapshottableDir [ hrt_qa]
> hdfs dfs -ls /tmp/falcon-regression/HDFSSnapshotTest/source/.snapshot
> Target:
> ======
> hdfs dfs -mkdir -p /tmp/falcon-regression/HDFSSnapshotTest/target
> hdfs dfsadmin -allowSnapshot /tmp/falcon-regression/HDFSSnapshotTest/target
> hdfs dfs -ls /tmp/falcon-regression/HDFSSnapshotTest/target/.snapshot
> hdfs-snapshot.properties
> ==========================
> {noformat}
> jobName=HDFSSnapshotTest
> jobClusterName=primaryCluster
> jobValidityStart=2016-05-09T06:25Z
> jobValidityEnd=2017-05-09T08:00Z
> jobFrequency=days(1)
> sourceCluster=primaryCluster
> sourceSnapshotDir=/tmp/falcon-regression/HDFSSnapshotTest/source
> sourceSnapshotRetentionAgeLimit=days(1)
> sourceSnapshotRetentionNumber=3
> targetCluster=backupCluster
> targetSnapshotDir=/tmp/falcon-regression/HDFSSnapshotTest/target
> targetSnapshotRetentionAgeLimit=days(1)
> targetSnapshotRetentionNumber=3
> jobAclOwner=hrt_qa
> jobAclGroup=users
> jobAclPermission="0x755"            
> {noformat}
> falcon extension -extensionName hdfs-snapshot-mirroring -submitAndSchedule -file hdfs-snapshot.properties



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message