hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "ZooKeeper/Troubleshooting" by PatrickHunt
Date Mon, 30 Nov 2009 18:15:08 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "ZooKeeper/Troubleshooting" page has been changed by PatrickHunt.
http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting?action=diff&rev1=8&rev2=9

--------------------------------------------------

  As told by a user:
  
  "This [[https://issues.apache.org/jira/browse/ZOOKEEPER-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706402#action_12706402|issue]]
is clearly linked to heavy utilization or swapping on the clients. I find that if I keep the
clients from swapping that this error materializes relatively infrequently, and when it does
materialize it is linked to a sudden increase in load. For example, the concurrent start of
100 clients on 14 machines will sometimes trigger this issue. <...> All in all, it is
my sense that Java processes must avoid swapping if they want to have not just timely but
also reliable behavior."
+ 
+ As told by a HBase user:
+ 
+ "After looking ganglia
+ history, it's clear that the nodes in question were starved of memory,
+ swapping like crazy.  The expired scanner lease, the region shutting down,
+ and as you noted, the Zookeeper session expiry, were not a causal chain, but
+ all the result of the machine grinding to a halt from swapping.  The
+ MapReduce tasks were allocated too much memory, and an apparent memory leak
+ in the job we were running was causing the tasks to eat into the
+ RegionServer's share of the machine's memory.  I've reduced the memory
+ allocated to tasks in hadoop's "mapred.child.java.opts" to ensure that the
+ HADOOP_HEAPSIZE + total maximum memory allocated to tasks + the
+ HBASE_HEAPSIZE is not greater than the memory available on the machine."
  
  
  === Hardware misconfiguration - NIC ===

Mime
View raw message