giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Vesse (JIRA)" <j...@apache.org>
Subject [jira] [Created] (GIRAPH-810) Giraph should track aggregate statistics over lifetime of the computation
Date Wed, 04 Dec 2013 16:15:36 GMT
Rob Vesse created GIRAPH-810:
--------------------------------

             Summary: Giraph should track aggregate statistics over lifetime of the computation
                 Key: GIRAPH-810
                 URL: https://issues.apache.org/jira/browse/GIRAPH-810
             Project: Giraph
          Issue Type: Improvement
    Affects Versions: 1.1.0
            Reporter: Rob Vesse


When Giraph completes a job it reports a set of information about the job like so:

{noformat}
Giraph Timers
2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main):     Superstep 3 TriangleFindingComputation
(ms)=102234
2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main):     Superstep 2 TriangleFindingComputation
(ms)=29419
2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main):     Superstep 1 TriangleFindingComputation
(ms)=34397
2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main):     Input superstep
(ms)=12642
2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main):     Total (ms)=208962
2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main):     Superstep 0 TriangleFindingComputation
(ms)=4201
2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main):     Shutdown (ms)=2698
2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main):     Setup (ms)=23351
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):   Zookeeper server:port
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):     ip-10-145-221-220.ec2.internal:22181=0
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):   Giraph Stats
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):     Aggregate edges=150000
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):     Sent message bytes=0
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):     Superstep=4
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):     Last checkpointed
superstep=0
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):     Current workers=16
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):     Current master
task partition=0
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):     Sent messages=0
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):     Aggregate finished
vertices=1000
2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):     Aggregate vertices=1000
{noformat}

The problem is that some of this statistics are not particularly helpful since they pertain
only to the most recent super step, namely Sent messages and Sent  messages bytes.

I can understand that there is a reason for doing this since the number of sent messages is
used in helping to determine whether a computation should halt at a given super step but it
would be useful if these were also tracked in aggregate over the lifetime of the computation.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message