giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avery Ching (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (GIRAPH-356) Help debug ZooKeeper issues
Date Fri, 05 Oct 2012 20:46:02 GMT

     [ https://issues.apache.org/jira/browse/GIRAPH-356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Avery Ching updated GIRAPH-356:
-------------------------------

    Attachment: GIRAPH-356.2.patch

Updated patch to address all the ZooKeeper issues I could find at scale.

-Configuration ZooKeeper connection attempts, min/max session timeout, force sync (off for
perf), skip ACLS (no for perf)
-Do not kill job on a disconnect event, it's still possible for the client to connect again,
only session expired is bad
-Dump failed workers on the master when a superstep does not get started due to missing ZooKeeper
health
-Dump last 100 lines of ZooKeeper process stdout/stderr when there is a failure that could
be related to ZooKeeper
-Small change for more descriptive message when can't find last good checkpoint
                
> Help debug ZooKeeper issues
> ---------------------------
>
>                 Key: GIRAPH-356
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-356
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-356.2.patch, GIRAPH-356.patch
>
>
> Currently, if the ZooKeeper process fails, we have little information on why and what
happened.  This patch addresses this by keeping the last 100 log lines and dumps when the
map fails under a RuntimeException.
> Here is an example of a master task failure when there is an invalid JVM argument passed
to ZooKeeper.  The error is much for obvious now.
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager: logZooKeeperOutput:
Dumping up to last 100 lines of the ZooKeeper process STDOUT and STDERR.
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager$StreamCollector: Unrecognized
option: -BadOpt
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager$StreamCollector: Could
not create the Java virtual machine.
> 2012-10-04 15:05:28,919 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing
logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2012-10-04 15:05:28,959 WARN org.apache.hadoop.mapred.Child: Error running child
> java.lang.IllegalStateException: run: Caught an unrecoverable exception onlineZooKeeperServers:
Failed to connect in 5 tries!
>                                  at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:591)
>                                  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>                                  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>                                  at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>                                  at java.security.AccessController.doPrivileged(Native
Method)
>                                  at javax.security.auth.Subject.doAs(Subject.java:396)
>                                  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>                                  at org.apache.hadoop.mapred.Child.main(Child.java:253)
> Caused by: java.lang.IllegalStateException: onlineZooKeeperServers: Failed to connect
in 5 tries!
>        at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:721)
>        at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:328)
>        at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:573)
>        ... 7 more
> 2012-10-04 15:05:28,963 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the
task

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message