hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@gmail.com>
Subject Re: PageRank Experiment Iteration
Date Wed, 24 Oct 2012 09:00:20 GMT
Yes I generated it for an algorithm from movie actors (to calculate Kevin
Bacon numbers).
However like I already told you, you can rewrite the generator mapreduce
job that creates random input for SSSP:
https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/bsp/RandomGraphGenerator.java

Basically you have to remove the weights from outputting RandomMapper.
So instead of

s += Long.toString(rowId) + ":" + rand.nextInt(100) + "\t";
>
> You would do:

> s += Long.toString(rowId) + "\t";
>
>  Of course you can also use a Stringbuilder instead of +=, but String
concat usually isn't a bottleneck in MapReduce ;))

2012/10/24 Shuo Wang <ecisp.wangshuo@gmail.com>

> Do you generate the data yourself? Can you provide the data generator for
> me?
>
> 2012/10/24 Thomas Jungblut <thomas.jungblut@gmail.com>
>
> > 12 gigs, it uses several more (up to 10?) times the memory than the
> dataset
> > size.
> >
> > 2012/10/24 Shuo Wang <ecisp.wangshuo@gmail.com>
> >
> > > How large your data is? Our cluster has 10 nodes, 45 tasks, each task
> has
> > > 512M memory. But when I run the 200M data, it has OUTOFMEMORY failure.
> > >
> > > 2012/10/24 Thomas Jungblut <thomas.jungblut@gmail.com>
> > >
> > > > Sure it does run, if you have enough ram ;)
> > > >
> > > > 2012/10/24 Shuo Wang <ecisp.wangshuo@gmail.com>
> > > >
> > > > > How much data have you run the pagerank on HAMA? Does it run? I
> want
> > to
> > > > run
> > > > > large data for pagerank on HAMA, but it always fails.
> > > > >
> > > > > 2012/10/24 Thomas Jungblut <thomas.jungblut@gmail.com>
> > > > >
> > > > > > Yes it works on any directed graph.
> > > > > > The best format to use is
> > > > > >
> > > > > > Vertex <\t> AdjacentVertex1 <\n> AdjacentVertex2
etc.
> > > > > >
> > > > > > So you have a adjacency list, and a vertex is represented by
each
> > > line.
> > > > > > This is splittable, which the web-google dataset is not.
> > > > > >
> > > > > > 2012/10/24 Shuo Wang <ecisp.wangshuo@gmail.com>
> > > > > >
> > > > > > > Thanks! Does the pagerank work on any web graph? I generate
a
> > > random
> > > > > web
> > > > > > > graph just like the data type of web-Google.txt, but the
result
> > is
> > > > > > > infinity.
> > > > > > >
> > > > > > > 2012/10/24 Thomas Jungblut <thomas.jungblut@gmail.com>
> > > > > > >
> > > > > > > > Because graph iterations != supersteps. You have to
take the
> > > > > > partitioning
> > > > > > > > into account, the time to accumulate the number of
vertices.
> > > > Pagerank
> > > > > > > > requires an additional superstep to run aggregators.
> > > > > > > >
> > > > > > > > 2012/10/24 Shuo Wang <ecisp.wangshuo@gmail.com>
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I have run the pagerank on HAMA, I set the max
iteration to
> > 20,
> > > > but
> > > > > > it
> > > > > > > > run
> > > > > > > > > 48 supersteps. Why?
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message