giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jyotirmoy Sundi <sundi...@gmail.com>
Subject Re: giraph hanging after superstep
Date Mon, 14 Oct 2013 19:16:53 GMT
The latest trunk compiled without the need not change any interfaces apart
from just adding a new exception to one of the class.


On Mon, Oct 14, 2013 at 11:40 AM, Jyotirmoy Sundi <sundi133@gmail.com>wrote:

> Thanks will try that out, rewriting in saveVertices to match the new
> interfaces does not seem too big.
> Did you find out later what might be a potential issues for the same ?
>
> Thanks
> Sund
>
>
> On Mon, Oct 14, 2013 at 11:26 AM, Manuel Lagang <manuellagang@gmail.com>wrote:
>
>> I also had the same issues when I used the out-of-core features, even for
>> trivial datasets, when I used the 1.0.0-RC3 branch. The job would seem to
>> finish all supersteps, but it would hang during the final output of data to
>> HDFS. I found that if I used the latest code in trunk instead (which
>> required some rewriting to match the new interface), then my jobs would
>> finish fine.
>>
>>
>> On Mon, Oct 14, 2013 at 11:13 AM, Jyotirmoy Sundi <sundi133@gmail.com>wrote:
>>
>>> Hi folks,
>>>           We are successfully able to run Giraph for 1B vertices and
>>> around 20B edges in our cluster. This is great. But when we run it over 5B
>>> vertices over the actual data and around 50B edges we see some issues in
>>> the final step while offloading the partitions. Since the dataset is huge
>>> for our cluster, we are using giraph.useOutOfCoreGraph and giraph.useOutOfCoreMessages
>>> to spill the data when overloaded.With this setup all the supersteps
>>> finished within around 4 hours. But in the final step after reporting
>>> saving vertices in task status, it hangs after writing a few partitions, it
>>> is happening consistently in our case. I played with all the config
>>> params and nothing is helping out, any suggestions from you will be really
>>> helpful. Thanks a lot.
>>>
>>>  The log snippet:
>>>
>>> 2013-10-14 10:24:20,144 INFO org.apache.giraph.worker.BspServiceWorker: saveVertices:
Starting to save 26146422 vertices
>>> 2013-10-14 10:24:20,183 INFO org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 1922 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-1922_vertices
>>> 2013-10-14 10:24:20,307 WARN org.apache.giraph.bsp.BspService: process: Unknown
and unprocessed event (path=/_hadoopBsp/job_201310130212_0013/_applicationAttemptsDir/0/_superstepDir/15/_addressesAndPartitions,
type=NodeDeleted, state=SyncConnected)
>>> 2013-10-14 10:24:20,431 WARN org.apache.giraph.bsp.BspService: process: Unknown
and unprocessed event (path=/_hadoopBsp/job_201310130212_0013/_applicationAttemptsDir/0/_superstepDir/15/_superstepFinished,
type=NodeDeleted, state=SyncConnected)
>>> 2013-10-14 10:24:20,555 INFO org.apache.giraph.worker.BspServiceWorker: processEvent:
Job state changed, checking to see if it needs to restart
>>> 2013-10-14 10:24:20,640 INFO org.apache.giraph.bsp.BspService: getJobState: Job
state already exists (/_hadoopBsp/job_201310130212_0013/_masterJobState)
>>> 2013-10-14 10:24:22,928 INFO org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 13762 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-13762_vertices
>>> 2013-10-14 10:24:27,648 INFO org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 23682 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-23682_vertices
>>> 2013-10-14 10:24:30,557 INFO org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 14882 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-14882_vertices
>>> 2013-10-14 10:24:32,935 INFO org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 11842 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-11842_vertices
>>> 2013-10-14 10:24:33,714 INFO org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 962 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-962_vertices
>>> 2013-10-14 10:24:35,184 INFO org.apache.giraph.worker.BspServiceWorker: saveVertices:
Saved 978047 out of 26146422 vertices, on partition 5 out of 160
>>> 2013-10-14 10:24:35,187 INFO org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 22722 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-22722_vertices
>>> 2013-10-14 10:24:37,276 INFO org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 21762 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-21762_vertices
>>> 2013-10-14 10:24:39,868 INFO org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 11362 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-11362_vertices
>>> 2013-10-14 10:24:41,391 INFO org.apache.giraph.partition.DiskBackedPartitionStore:
offloadPartition: writing partition vertices 482 to /mnt/diskg/mapred/local/taskTracker/sundi133/jobcache/job_201310130212_0013/attempt_201310130212_0013_m_000060_0/work/_bsp/_partitions/job_201310130212_0013/partition-482_vertices
>>>
>>> ------------------------------
>>>
>>>
>>> *The error show in the job failure page for each attempt*
>>>
>>>
>>>
>>> FAILED
>>>
>>>
>>> Task attempt_201310130212_0013_m_000001_0 failed to report status for 7200 seconds.
Killing!
>>>
>>>
>>> --
>>> Best Regards,
>>> Jyotirmoy Sundi
>>> Data Engineer,
>>> Admobius
>>>
>>> San Francisco, CA 94158
>>>
>>
>>
>
>
> --
> Best Regards,
> Jyotirmoy Sundi
> Data Engineer,
> Admobius
>
> San Francisco, CA 94158
>



-- 
Best Regards,
Jyotirmoy Sundi
Data Engineer,
Admobius

San Francisco, CA 94158

Mime
View raw message