hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oren <or...@infolinks.com>
Subject reduce network problem after using cache dns
Date Wed, 04 Jan 2012 15:08:36 GMT
hi.
i have a small hadoop grid connected  with a 1g network.
when servers are configured to use the local dns server the jobs are 
running without a problem and copy speed during reduce is tens on MB.
once i change the servers to work with a cache only named server on each 
node, i start to get failed tasks with timeout errors.
also, copy speed is reduced to under 1M.

there is NO degradation in network, copy of files between servers is 
still tens of MB.
resolving is working ok and in the same speed (give or take) with both 
configurations.

any idea of what happens during the map/reduce process that causes this 
behavior?
this is an example for the exceptions i get during map:
Too many fetch-failures

and during reduce:
java.lang.RuntimeException: 
org.apache.hadoop.hbase.ZooKeeperConnectionException: 
java.net.UnknownHostException: s06.xxx.local at 
org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:38)

at 
org.apache.hadoop.hbase.client.HTablePool.createHTable(HTablePool.java:129) 
at 
org.apache.hadoop.hbase.client.HTablePool.getTable(HTablePool.java:89) 
at 
com.infolinks.hadoop.commons.hbase.HBaseOperations.getTable(HBaseOperations.java:118) 
at 
com.infolinks.hadoop.framework.HBaseReducer.setup(HBaseReducer.java:71) 
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174) at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566) 
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at 
org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: 
org.apache.hadoop.hbase.ZooKeeperConnectionException: 
java.net.UnknownHostException: s06.xxx.local at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1000)

at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:303)

at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.(HConnectionManager.java:294)

at 
org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:156)

at org.apache.hadoop.hbase.client.HTable.(HTable.java:167) at 
org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:36)

... 8 more Caused by: java.net.UnknownHostException: s06.xxx.local at 
java.net.InetAddress.getAllByName0(InetAddress.java:1158) at 
java.net.InetAddress.getAllByName(InetAddress.java:1084) at 
java.net.InetAddress.getAllByName(InetAddress.java:1020) at 
org.apache.zookeeper.ClientCnxn.(ClientCnxn.java:386) at 
org.apache.zookeeper.ClientCnxn.(ClientCnxn.java:331) at 
org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:377) at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.connect(ZKUtil.java:97) at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:119) 
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:998)

... 13 more

thank you,
Oren.


Mime
View raw message