giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitaly Tsvetkoff <>
Subject Problem with big datasets on cloudera yarn cluster
Date Mon, 20 Jul 2015 13:01:13 GMT
Hello everyone 1more time!
I am a newbie in hadoop and giraph and I wrote my custom giraph algorithm
CustomWeighredPageRank, one of the PageRank modifications and
CustomInputFormat for it (i put it in giraph-examples jar). I successfully
run it on cloudera yarn cluster (4 machine, each one has 6 cores and 12
threads) with small datasets (examples, and ~8 millions of vertices), but
all the time  problem with big datasets (~=10 millions of vertices) occurs.

The console runner is
hadoop jar
 org.apache.giraph.GiraphRunner \
 -Dgiraph.yarn.task.heap.mb=4096 \
 -Dgiraph.isStaticGraph=true \
 -Dgiraph.useOutOfCoreGraph=true \
 -Dgiraph.useOutOfCoreMessages=true \
 -Dgiraph.numInputThreads=12 \
 -Dgiraph.numComputeThreads=12 \
 -Dgiraph.weightedPageRank.superstepCount=30 \
 ru.custom.CustomWeightedPageRankComputation \
 -vif ru.custom.CustomInputFormat \
 -vip /tmp/giraph_input \
 -vof \
 -op /tmp/giraph \
 -w 12 \

Please see the container logs here and main
log .
It seems calculations start ok but during one of supersteps it crashes.
Maybe I use "bad" properties? Should -w property be equals to machine or
thread counts?

I hope anybody here could help me to solve this problem!

View raw message