giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Heitmann <>
Subject Re: Giraph still running after mappers are 100% finished ?
Date Tue, 01 May 2012 15:25:11 GMT

Hello Avery, 

On 1 May 2012, at 15:45, Avery Ching wrote:

> I wonder if the issues you are seeing are related to
> This shouldn't happen.

Good to know that that should not happen. 

For my specific algorithm it happens all the time. 
For small amounts of processing the job finishes 2 minutes after the mappers report a 100%.

For larger amounts it can take 20 minutes or so. So there is definitively a connection between
the expected length of processing the job, 
and the amount of time which passes after the mappers report 100%. 

I even had a pretty extreme case where most of the workers where restarted after an hour,
and I killed the job after 90 minutes.

In addition, the "100% map" always comes about 14-15 minutes after starting the job, independent
of the total processing time. 
That might be due to the time it takes to read in the data, which is always around 11 minutes
for the "vertex input superstep". 
(The data (and its size) which my job reads in order to construct the graph is always the
same. Only the "configuration" of the algorithm changes. 
In my case, the configuration consists of the set of start nodes, and the association between
different start nodes and user ids). 

Should I attach a zip file of the log directory for the job which restarted most of its workers
after an hour ? 
I can attach that to the JIRA issue. 
View raw message