flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: [Proposal] Gelly Graph Generators
Date Tue, 29 Sep 2015 14:01:49 GMT
I would be happy to see some generators in Gelly for exactly the reasons
you've mentioned. Its always difficult for me to get some testing data when
running Flink on a new cluster ... so this would help me ;)

On Thu, Sep 24, 2015 at 11:03 AM, Vasiliki Kalavri <
vasilikikalavri@gmail.com> wrote:

> Hi Greg,
>
> thank you for this proposal!
> I think graph generators will be a very useful addition to Gelly.
>
> I'm not quite familiar with the state-of-the-art algorithms for distributed
> graph generation.
> I suppose that we could easily provide an efficient random graph generator
> and I've also seen some work on parallel/distributed algorithms for R-MAT
> [1, 2].
> Are you aware of similar work for Erdos-Reniy, Kronecker or other types of
> graphs?
> Another place we might want to look at is Giraph's Watts-Strogatz generator
> [3].
>
> Cheers,
> Vasia.
>
> [1]: https://github.com/farkhor/PaRMAT/
> [2]: http://arxiv.org/pdf/1210.0187.pdf
> [3]:
>
> https://giraph.apache.org/apidocs/org/apache/giraph/io/formats/WattsStrogatzVertexInputFormat.html
>
>
> On 23 September 2015 at 19:49, Greg Hogan <code@greghogan.com> wrote:
>
> > I would like to propose that Flink include a selection of graph
> generators
> > in Gelly. Generated graphs will be useful for performing scalability,
> > stress, and regression testing as well as benchmarking and comparing
> > algorithms, both for Flink users and developers. Generated data is
> > infinitely scalable yet described by a few simple parameters and can
> often
> > substitute for user data or sharing large files when reporting issues.
> >
> > Spark's GraphX includes a modest GraphGenerators class [1].
> >
> > The initial implementation would focus on Erdos-Renyi, R-Mat [2], and
> > Kronecker [3] generators.
> >
> > A key consideration is that the graphs should be seedable and generate
> the
> > same Graph regardless of parallelism.
> >
> > Generated data is a complement to my proposed "Checksum method for
> DataSet
> > and Graph" [4].
> >
> > [1]
> >
> >
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.graphx.util.GraphGenerators$
> > [2] R-MAT: A Recursive Model for Graph Mining;
> > http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf
> > [3] Kronecker graphs: An Approach to Modeling Networks;
> > http://arxiv.org/pdf/0812.4905v2.pdf
> > [4] https://issues.apache.org/jira/browse/FLINK-2716
> >
> > Greg Hogan
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message