Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E16C417B9D for ; Tue, 29 Sep 2015 14:02:10 +0000 (UTC) Received: (qmail 39414 invoked by uid 500); 29 Sep 2015 14:02:10 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 39353 invoked by uid 500); 29 Sep 2015 14:02:10 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 39342 invoked by uid 99); 29 Sep 2015 14:02:10 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Sep 2015 14:02:10 +0000 Received: from mail-wi0-f170.google.com (mail-wi0-f170.google.com [209.85.212.170]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 429BA1A022D for ; Tue, 29 Sep 2015 14:02:10 +0000 (UTC) Received: by wicfx3 with SMTP id fx3so152105182wic.1 for ; Tue, 29 Sep 2015 07:02:08 -0700 (PDT) X-Received: by 10.194.172.233 with SMTP id bf9mr28495594wjc.107.1443535328665; Tue, 29 Sep 2015 07:02:08 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.11.129 with HTTP; Tue, 29 Sep 2015 07:01:49 -0700 (PDT) In-Reply-To: References: From: Robert Metzger Date: Tue, 29 Sep 2015 16:01:49 +0200 Message-ID: Subject: Re: [Proposal] Gelly Graph Generators To: "dev@flink.apache.org" Content-Type: multipart/alternative; boundary=089e013c64447c6c4c0520e342ca --089e013c64447c6c4c0520e342ca Content-Type: text/plain; charset=UTF-8 I would be happy to see some generators in Gelly for exactly the reasons you've mentioned. Its always difficult for me to get some testing data when running Flink on a new cluster ... so this would help me ;) On Thu, Sep 24, 2015 at 11:03 AM, Vasiliki Kalavri < vasilikikalavri@gmail.com> wrote: > Hi Greg, > > thank you for this proposal! > I think graph generators will be a very useful addition to Gelly. > > I'm not quite familiar with the state-of-the-art algorithms for distributed > graph generation. > I suppose that we could easily provide an efficient random graph generator > and I've also seen some work on parallel/distributed algorithms for R-MAT > [1, 2]. > Are you aware of similar work for Erdos-Reniy, Kronecker or other types of > graphs? > Another place we might want to look at is Giraph's Watts-Strogatz generator > [3]. > > Cheers, > Vasia. > > [1]: https://github.com/farkhor/PaRMAT/ > [2]: http://arxiv.org/pdf/1210.0187.pdf > [3]: > > https://giraph.apache.org/apidocs/org/apache/giraph/io/formats/WattsStrogatzVertexInputFormat.html > > > On 23 September 2015 at 19:49, Greg Hogan wrote: > > > I would like to propose that Flink include a selection of graph > generators > > in Gelly. Generated graphs will be useful for performing scalability, > > stress, and regression testing as well as benchmarking and comparing > > algorithms, both for Flink users and developers. Generated data is > > infinitely scalable yet described by a few simple parameters and can > often > > substitute for user data or sharing large files when reporting issues. > > > > Spark's GraphX includes a modest GraphGenerators class [1]. > > > > The initial implementation would focus on Erdos-Renyi, R-Mat [2], and > > Kronecker [3] generators. > > > > A key consideration is that the graphs should be seedable and generate > the > > same Graph regardless of parallelism. > > > > Generated data is a complement to my proposed "Checksum method for > DataSet > > and Graph" [4]. > > > > [1] > > > > > http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.graphx.util.GraphGenerators$ > > [2] R-MAT: A Recursive Model for Graph Mining; > > http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf > > [3] Kronecker graphs: An Approach to Modeling Networks; > > http://arxiv.org/pdf/0812.4905v2.pdf > > [4] https://issues.apache.org/jira/browse/FLINK-2716 > > > > Greg Hogan > > > --089e013c64447c6c4c0520e342ca--