Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C0F801753F for ; Tue, 6 Oct 2015 08:03:27 +0000 (UTC) Received: (qmail 67688 invoked by uid 500); 6 Oct 2015 08:03:27 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 67611 invoked by uid 500); 6 Oct 2015 08:03:27 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 67601 invoked by uid 99); 6 Oct 2015 08:03:27 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Oct 2015 08:03:27 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 2E78AC0BE2 for ; Tue, 6 Oct 2015 08:03:27 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.11 X-Spam-Level: X-Spam-Status: No, score=-0.11 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=mailbox.org Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id hbCLD6SG33sS for ; Tue, 6 Oct 2015 08:03:19 +0000 (UTC) Received: from mx1.mailbox.org (mx1.mailbox.org [80.241.60.212]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id A871125426 for ; Tue, 6 Oct 2015 08:03:18 +0000 (UTC) Received: from smtp1.mailbox.org (smtp1.mailbox.org [80.241.60.240]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.mailbox.org (Postfix) with ESMTPS id AD57F43E45 for ; Tue, 6 Oct 2015 10:03:14 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=mailbox.org; h= content-transfer-encoding:content-type:content-type:mime-version :user-agent:date:date:message-id:subject:subject:from:from :received; s=mail20150812; t=1444118593; bh=P1zZWU+mH1VwsAl+syUZ qWVNCgrBXLrw3HjtnHFaB+s=; b=HWf/a+xuh72Wit0cYFtZO+lVXoRgugsBv1d4 2rtq/OhAaIpRGHGVZ24PI6DxKKSqX4mUm3s8/JvaCbICvFayITj46v8GJfm6DzWv Gvzvqw1tGmWr5hAtZzGQjynJ01491escsVWJmeOZ7iDlHzGNxxduXzQ0U6LFkrPy XazBT2iRy5KfuvjXGbp5i7gBWFhdG05vJ6/iIPT7RXlagfGwv8L/k0uEwbvGFeb6 M9T9VYenn1R3PvS/GnjPS+w5ppr8qIm/ODs63Rl6lFWDO86h0YVDLEu/bqfiZ42S Ntvek4jfJSN03ZDtdlEOhmgACDBOOvePqv5hqDzsGt/zWhh3jw== X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp1.mailbox.org ([80.241.60.240]) by gerste.heinlein-support.de (gerste.heinlein-support.de [91.198.250.173]) (amavisd-new, port 10030) with ESMTP id U3YLHXEIHMoZ for ; Tue, 6 Oct 2015 10:03:13 +0200 (CEST) To: user@flink.apache.org From: Martin Junghanns Subject: LDBC Graph Data into Flink Message-ID: <56138040.1010305@mailbox.org> Date: Tue, 6 Oct 2015 10:03:12 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Hi all, For our benchmarks with Flink, we are using a data generator provided by the LDBC project (Linked Data Benchmark Council) [1][2]. The generator uses MapReduce to create directed, labeled, attributed graphs that mimic properties of real online social networks (e.g, degree distribution, diameter). The output is stored in several files either local or in HDFS. Each file represents a vertex, edge or multi-valued property class. I wrote a little tool, that parses and transforms the LDBC output into two datasets representing vertices and edges. Each vertex has a unique id, a label and payload according to the LDBC schema. Each edge has a unique id, a label, source and target vertex IDs and also payload according to the schema. I thought this may be useful for others so I put it on GitHub [2]. It currently uses Flink 0.10-SNAPSHOT as it depends on some fixes made in there. Best, Martin [1] http://ldbcouncil.org/ [2] https://github.com/ldbc/ldbc_snb_datagen [3] https://github.com/s1ck/ldbc-flink-import