giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manuel Lagang <manuellag...@gmail.com>
Subject Re: giraph hanging after superstep
Date Mon, 14 Oct 2013 18:26:32 GMT
I also had the same issues when I used the out-of-core features, even for
trivial datasets, when I used the 1.0.0-RC3 branch. The job would seem to
finish all supersteps, but it would hang during the final output of data to
HDFS. I found that if I used the latest code in trunk instead (which
required some rewriting to match the new interface), then my jobs would
finish fine.


On Mon, Oct 14, 2013 at 11:13 AM, Jyotirmoy Sundi <sundi133@gmail.com>wrote:

> Hi folks,
>           We are successfully able to run Giraph for 1B vertices and
> around 20B edges in our cluster. This is great. But when we run it over 5B
> vertices over the actual data and around 50B edges we see some issues in
> the final step while offloading the partitions. Since the dataset is huge
> for our cluster, we are using giraph.useOutOfCoreGraph and giraph.useOutOfCoreMessages
> to spill the data when overloaded.With this setup all the supersteps
> finished within around 4 hours. But in the final step after reporting
> saving vertices in task status, it hangs after writing a few partitions, it
> is happening consistently in our case. I played with all the config
> params and nothing is helping out, any suggestions from you will be really
> helpful. Thanks a lot.
>
>  The log snippet:
>
> 2013-10-14 10:24:20,144 INFO org.apache.giraph.worker.BspServiceWorker: saveVertices:
Starting to save 26146422 vertices
> 2013-10-14 10:24:20,183 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition:
writing partition vertices 1922 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-1922_vertices
> 2013-10-14 10:24:20,307 WARN org.apache.giraph.bsp.BspService: process: Unknown and unprocessed
event (path=/_hadoopBsp/job_201310130212_0013/_applicationAttemptsDir/0/_superstepDir/15/_addressesAndPartitions,
type=NodeDeleted, state=SyncConnected)
> 2013-10-14 10:24:20,431 WARN org.apache.giraph.bsp.BspService: process: Unknown and unprocessed
event (path=/_hadoopBsp/job_201310130212_0013/_applicationAttemptsDir/0/_superstepDir/15/_superstepFinished,
type=NodeDeleted, state=SyncConnected)
> 2013-10-14 10:24:20,555 INFO org.apache.giraph.worker.BspServiceWorker: processEvent:
Job state changed, checking to see if it needs to restart
> 2013-10-14 10:24:20,640 INFO org.apache.giraph.bsp.BspService: getJobState: Job state
already exists (/_hadoopBsp/job_201310130212_0013/_masterJobState)
> 2013-10-14 10:24:22,928 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition:
writing partition vertices 13762 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-13762_vertices
> 2013-10-14 10:24:27,648 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition:
writing partition vertices 23682 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-23682_vertices
> 2013-10-14 10:24:30,557 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition:
writing partition vertices 14882 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-14882_vertices
> 2013-10-14 10:24:32,935 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition:
writing partition vertices 11842 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-11842_vertices
> 2013-10-14 10:24:33,714 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition:
writing partition vertices 962 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-962_vertices
> 2013-10-14 10:24:35,184 INFO org.apache.giraph.worker.BspServiceWorker: saveVertices:
Saved 978047 out of 26146422 vertices, on partition 5 out of 160
> 2013-10-14 10:24:35,187 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition:
writing partition vertices 22722 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-22722_vertices
> 2013-10-14 10:24:37,276 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition:
writing partition vertices 21762 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-21762_vertices
> 2013-10-14 10:24:39,868 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition:
writing partition vertices 11362 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-11362_vertices
> 2013-10-14 10:24:41,391 INFO org.apache.giraph.partition.DiskBackedPartitionStore: offloadPartition:
writing partition vertices 482 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-482_vertices
>
> ------------------------------
>
>
> *The error show in the job failure page for each attempt*
>
>
>
> FAILED
>
>
> Task attempt_201310130212_0013_m_000001_0 failed to report status for 7200 seconds. Killing!
>
>
> --
> Best Regards,
> Jyotirmoy Sundi
> Data Engineer,
> Admobius
>
> San Francisco, CA 94158
>

Mime
View raw message