giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jyotirmoy Sundi <sundi...@gmail.com>
Subject Re: giraph hanging after superstep
Date Mon, 14 Oct 2013 18:40:53 GMT
Thanks will try that out, rewriting in saveVertices to match the new
interfaces does not seem too big.
Did you find out later what might be a potential issues for the same ?

Thanks
Sund


On Mon, Oct 14, 2013 at 11:26 AM, Manuel Lagang <manuellagang@gmail.com>wrote:

> I also had the same issues when I used the out-of-core features, even for
> trivial datasets, when I used the 1.0.0-RC3 branch. The job would seem to
> finish all supersteps, but it would hang during the final output of data to
> HDFS. I found that if I used the latest code in trunk instead (which
> required some rewriting to match the new interface), then my jobs would
> finish fine.
>
>
> On Mon, Oct 14, 2013 at 11:13 AM, Jyotirmoy Sundi <sundi133@gmail.com>wrote:
>
>> Hi folks,
>>           We are successfully able to run Giraph for 1B vertices and
>> around 20B edges in our cluster. This is great. But when we run it over 5B
>> vertices over the actual data and around 50B edges we see some issues in
>> the final step while offloading the partitions. Since the dataset is huge
>> for our cluster, we are using giraph.useOutOfCoreGraph and giraph.useOutOfCoreMessages
>> to spill the data when overloaded.With this setup all the supersteps
>> finished within around 4 hours. But in the final step after reporting
>> saving vertices in task status, it hangs after writing a few partitions, it
>> is happening consistently in our case. I played with all the config
>> params and nothing is helping out, any suggestions from you will be really
>> helpful. Thanks a lot.
>>
>>  The log snippet:
>>
>> 2013-10-14 10:24:20,144 INFO org.apache.giraph.worker.BspServiceWorker: saveVertices:
Starting to save 26146422 vertices
>> 2013-10-14 10:24:20,183 INFO org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 1922 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-1922_vertices
>> 2013-10-14 10:24:20,307 WARN org.apache.giraph.bsp.BspService: process: Unknown and
unprocessed event (path=/_hadoopBsp/job_201310130212_0013/_applicationAttemptsDir/0/_superstepDir/15/_addressesAndPartitions,
type=NodeDeleted, state=SyncConnected)
>> 2013-10-14 10:24:20,431 WARN org.apache.giraph.bsp.BspService: process: Unknown and
unprocessed event (path=/_hadoopBsp/job_201310130212_0013/_applicationAttemptsDir/0/_superstepDir/15/_superstepFinished,
type=NodeDeleted, state=SyncConnected)
>> 2013-10-14 10:24:20,555 INFO org.apache.giraph.worker.BspServiceWorker: processEvent:
Job state changed, checking to see if it needs to restart
>> 2013-10-14 10:24:20,640 INFO org.apache.giraph.bsp.BspService: getJobState: Job state
already exists (/_hadoopBsp/job_201310130212_0013/_masterJobState)
>> 2013-10-14 10:24:22,928 INFO org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 13762 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-13762_vertices
>> 2013-10-14 10:24:27,648 INFO org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 23682 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-23682_vertices
>> 2013-10-14 10:24:30,557 INFO org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 14882 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-14882_vertices
>> 2013-10-14 10:24:32,935 INFO org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 11842 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-11842_vertices
>> 2013-10-14 10:24:33,714 INFO org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 962 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-962_vertices
>> 2013-10-14 10:24:35,184 INFO org.apache.giraph.worker.BspServiceWorker: saveVertices:
Saved 978047 out of 26146422 vertices, on partition 5 out of 160
>> 2013-10-14 10:24:35,187 INFO org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 22722 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-22722_vertices
>> 2013-10-14 10:24:37,276 INFO org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 21762 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-21762_vertices
>> 2013-10-14 10:24:39,868 INFO org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 11362 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-11362_vertices
>> 2013-10-14 10:24:41,391 INFO org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 482 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-482_vertices
>>
>> ------------------------------
>>
>>
>> *The error show in the job failure page for each attempt*
>>
>>
>>
>> FAILED
>>
>>
>> Task attempt_201310130212_0013_m_000001_0 failed to report status for 7200 seconds.
Killing!
>>
>>
>> --
>> Best Regards,
>> Jyotirmoy Sundi
>> Data Engineer,
>> Admobius
>>
>> San Francisco, CA 94158
>>
>
>


-- 
Best Regards,
Jyotirmoy Sundi
Data Engineer,
Admobius

San Francisco, CA 94158

Mime
View raw message