giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hassan Eslami (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-1026) New Out-of-core mechanism does not work
Date Fri, 07 Aug 2015 18:28:46 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662220#comment-14662220
] 

Hassan Eslami commented on GIRAPH-1026:
---------------------------------------

Thanks Vitaly for describing the issues. I was working on a new patch as an add-on to flow-control
and may solve the problem. The patch is ready in GIRAPH-1027. You can access the code as well.

One of the reasons you get OutOfMemory is the lack of any limitation on the number of messages
workers send. You can try either of these solutions:
 1) -Dgiraph.waitForRequestsConfirmation=true and -Dgiraph.maxNumberOfOpenRequests=1000
  2) Apply the patch in GIRAPH-1027, and try -Dgiraph.waitForPerWorkerRequests=true and -Dgiraph.maxNumberOfUnsentRequests=1000

Note that the number "1000" in both cases is the number of requests a worker may keep in sender
memory. Assuming that each request is about 0.5MB by default, you may change the number "1000"
as you change your memory limits.

As your question regarding the previous implementation of out-of-core messages, I should note
that the new out-of-core mechanism handles messages by itself automatically. So, you would
not need to use the old out-of-core messages.

> New Out-of-core mechanism does not work
> ---------------------------------------
>
>                 Key: GIRAPH-1026
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-1026
>             Project: Giraph
>          Issue Type: Bug
>    Affects Versions: 1.2.0-SNAPSHOT
>            Reporter: Max Garmash
>
> After releasing new OOC mechanism we tried to test it on our data and it failed.
> Our environment:
> 4x (CPU 6 cores / 12 threads, RAM 64GB) 
> We can successfully process about 75 millions of vertices. 
> With 100-120M vertices it fails like this:
> {noformat}
> 2015-08-04 12:35:21,000 INFO  [AMRM Callback Handler Thread] yarn.GiraphApplicationMaster
(GiraphApplicationMaster.java:onContainersCompleted(574)) - Got container status for containerID=container_1438068521412_0193_01_000005,
state=COMPLETE, exitStatus=-104, diagnostics=Container [pid=6700,containerID=container_1438068521412_0193_01_000005]
is running beyond physical memory limits. Current usage: 20.3 GB of 20 GB physical memory
used; 22.4 GB of 42 GB virtual memory used. Killing container.
> Dump of the process-tree for container_1438068521412_0193_01_000005 :
> 	|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES)
RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> 	|- 6704 6700 6700 6700 (java) 78760 20733 24033841152 5317812 java -Xmx20480M -Xms20480M
-cp .:${CLASSPATH}:./*:$HADOOP_CLIENT_CONF_DIR:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*:$HADOOP_MAPRED_HOME/*:$HADOOP_MAPRED_HOME/lib/*:$MR2_CLASSPATH:./*:/etc/hadoop/conf.cloudera.yarn:/run/cloudera-scm-agent/process/264-yarn-NODEMANAGER:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/lib/*::./*:/etc/hadoop/conf.cloudera.yarn:/run/cloudera-scm-agent/process/264-yarn-NODEMANAGER:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/lib/*::./*:/etc/hadoop/conf.cloudera.yarn:/run/cloudera-scm-agent/process/264-yarn-NODEMANAGER:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/lib/*:
org.apache.giraph.yarn.GiraphYarnTask 1438068521412 193 5 1 
> 	|- 6700 6698 6700 6700 (bash) 0 0 14376960 433 /bin/bash -c java -Xmx20480M -Xms20480M
-cp .:${CLASSPATH}:./*:$HADOOP_CLIENT_CONF_DIR:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*:$HADOOP_MAPRED_HOME/*:$HADOOP_MAPRED_HOME/lib/*:$MR2_CLASSPATH:./*:/etc/hadoop/conf.cloudera.yarn:/run/cloudera-scm-agent/process/264-yarn-NODEMANAGER:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/lib/*:
org.apache.giraph.yarn.GiraphYarnTask 1438068521412 193 5 1 1>/var/log/hadoop-yarn/container/application_1438068521412_0193/container_1438068521412_0193_01_000005/task-5-stdout.log
2>/var/log/hadoop-yarn/container/application_1438068521412_0193/container_1438068521412_0193_01_000005/task-5-stderr.log
 
> Container killed on request. Exit code is 143
> Container exited with a non-zero exit code 143
> {noformat}
> Logs from container
> {noformat}
> 2015-08-04 12:34:51,258 INFO  [netty-server-worker-4] handler.RequestDecoder (RequestDecoder.java:channelRead(74))
- decode: Server window metrics MBytes/sec received = 12.5315, MBytesReceived = 380.217, ave
received req MBytes = 0.007, secs waited = 30.34
> 2015-08-04 12:35:16,258 INFO  [check-memory] ooc.CheckMemoryCallable (CheckMemoryCallable.java:call(221))
- call: Memory is very limited now. Calling GC manually. freeMemory = 924.27MB
> {noformat}
> We are running our job like this:
> {noformat}
> hadoop jar giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.6.0-cdh5.4.4-jar-with-dependencies.jar
\
>  org.apache.giraph.GiraphRunner \
>  -Dgiraph.yarn.task.heap.mb=20480 \
>  -Dgiraph.isStaticGraph=true \
>  -Dgiraph.useOutOfCoreGraph=true \
>  -Dgiraph.logLevel=info \
>  -Dgiraph.weightedPageRank.superstepCount=5 \
>  ru.isys.WeightedPageRankComputation \
>  -vif ru.isys.CrawlerInputFormat \
>  -vip /tmp/bigdata/input \
>  -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
>  -op /tmp/giraph \
>  -w 6 \
>  -yj giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.6.0-cdh5.4.4-jar-with-dependencies.jar
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message