giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vishal Patel <write2vis...@gmail.com>
Subject Re: Saving checkpoints?
Date Sun, 12 Aug 2012 16:52:40 GMT
Thanks André, yes that file helps a lot. I changed a couple of those things
now to suit my application. I'm able to save checkpoints every 50
supersteps (changed CLEANUP_CHECKPOINTS_AFTER_SUCCESS_DEFAULT to false so I
can see the files).

How do I "manually" restart from say step 100 even though the job has
finished successfully? Change string variable RESTART_SUPERSTEP to
_bsp/_checkpoints/job_201208071105_0007/100.finalized?

I'm assuming when it actually fails it will restart automatically from the
previous checkpoint.

Thank you.

Vishal


On Sun, Aug 12, 2012 at 3:57 AM, André Kelpe
<efeshundertelf@googlemail.com>wrote:

> Hi Vishal,
>
> you can control the checkpoint frequency with the setting
> "giraph.checkpointFrequency" in your JobConfiguration. The default is
> set to 0 right now, meaning no checkpoints are made. You should def.
> check out the GiraphJob [0] code, where all these tuning knobs are
> documented.
>
> --André
>
> [0]
> https://github.com/apache/giraph/blob/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java#L308
>
> 2012/8/11 Vishal Patel <write2vishal@gmail.com>:
> > Hi,
> >
> > How do I specify the interval for saving checkpoints? When working with
> > Amazon's Elastic Mapreduce on a large number of workers (> 80 workers,
> 40 x
> > m1.xlarge machines), sometimes there is RPC communication errors and
> > Zookeeper waits on that worker for a while before timing out and killing
> the
> > job all together.
> >
> > As my graph and number of workers is becoming larger I would like to
> learn
> > how to save it since that extra cost might be well worth it-- say every
> 50
> > supersteps. Here is the command I use currently, how should I modify it.
> >
> > hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar
> > org.apache.giraph.GiraphRunner
> > org.apache.giraph.examples.ConnectedComponentsVertex \
> > --inputFormat org.apache.giraph.examples.IntIntNullIntTextInputFormat \
> > --inputPath giraph_in/adj_list.txt \
> > --outputFormat
> > org.apache.giraph.examples.VertexWithComponentTextOutputFormat \
> > --outputPath giraph_out
> > --combiner org.apache.giraph.examples.MinimumIntCombiner
> > --workers 95
> >
> > Also, how do I restart from a specific checkpoint. The help for the
> > GiraphRunner class did not have instructions on this.
> >
> > Thank you!
> >
> > Vishal
> >
> >
>

Mime
View raw message