Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6642517744 for ; Tue, 6 Oct 2015 09:00:28 +0000 (UTC) Received: (qmail 33646 invoked by uid 500); 6 Oct 2015 08:53:14 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 33574 invoked by uid 500); 6 Oct 2015 08:53:14 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 33564 invoked by uid 99); 6 Oct 2015 08:53:14 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Oct 2015 08:53:14 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id A4130C0419 for ; Tue, 6 Oct 2015 08:53:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id YQfKfTOKVsCC for ; Tue, 6 Oct 2015 08:53:03 +0000 (UTC) Received: from mail-wi0-f172.google.com (mail-wi0-f172.google.com [209.85.212.172]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 8B994439D3 for ; Tue, 6 Oct 2015 08:53:03 +0000 (UTC) Received: by wicge5 with SMTP id ge5so156244656wic.0 for ; Tue, 06 Oct 2015 01:53:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=5rjbHZ6dPljzO6XZMQefKsUXy83D9UwtlsCJbS821io=; b=meqnTafFpv0GBamHQE+En2UQYu4zZ8kSERJ/+gSljXJ8TcYBw43wAi5aMSqr4Us3tL h7gl7vruIGeQqisiq9YLX8LhaHCd/JQHTWRcBEq5BFfmuJtYN21xI+fRLT4su/vc9XeY sdeV5ivxMBVl3AAmhT5lSD+EDFfnRZ5H4zTSdolHU2YcP5wIDVH/a2fR+aFNgFC2K5aW 7eBw/qAjfVYef4mgDpqjBXAL+QpJhCdfsc0d3ZTNINkaMFLR12FhR1TGq0fe2icrMzB/ NBvFgB7gIXScToV7eAqAHxnVoveuIwyu0YyY9bJmhTLPo9LBhFCmYqM6Ia4O7fLPbZxl hTTQ== MIME-Version: 1.0 X-Received: by 10.194.91.193 with SMTP id cg1mr41978586wjb.88.1444121582613; Tue, 06 Oct 2015 01:53:02 -0700 (PDT) Received: by 10.28.7.197 with HTTP; Tue, 6 Oct 2015 01:53:02 -0700 (PDT) In-Reply-To: <56138040.1010305@mailbox.org> References: <56138040.1010305@mailbox.org> Date: Tue, 6 Oct 2015 10:53:02 +0200 Message-ID: Subject: Re: LDBC Graph Data into Flink From: Vasiliki Kalavri To: user@flink.apache.org Content-Type: multipart/alternative; boundary=047d7bd907bcf1d7da05216bc157 --047d7bd907bcf1d7da05216bc157 Content-Type: text/plain; charset=UTF-8 Hi Martin, thanks a lot for sharing! This is a very useful tool. I only had a quick look, but if we merge label and payload inside a Tuple2, then it should also be Gelly-compatible :) Cheers, Vasia. On 6 October 2015 at 10:03, Martin Junghanns wrote: > Hi all, > > For our benchmarks with Flink, we are using a data generator provided by > the LDBC project (Linked Data Benchmark Council) [1][2]. The generator uses > MapReduce to create directed, labeled, attributed graphs that mimic > properties of real online social networks (e.g, degree distribution, > diameter). The output is stored in several files either local or in HDFS. > Each file represents a vertex, edge or multi-valued property class. > > I wrote a little tool, that parses and transforms the LDBC output into two > datasets representing vertices and edges. Each vertex has a unique id, a > label and payload according to the LDBC schema. Each edge has a unique id, > a label, source and target vertex IDs and also payload according to the > schema. > > I thought this may be useful for others so I put it on GitHub [2]. It > currently uses Flink 0.10-SNAPSHOT as it depends on some fixes made in > there. > > Best, > Martin > > [1] http://ldbcouncil.org/ > [2] https://github.com/ldbc/ldbc_snb_datagen > [3] https://github.com/s1ck/ldbc-flink-import > --047d7bd907bcf1d7da05216bc157 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Martin,

thanks a lot for sharing! This i= s a very useful tool.
I only had a quick look= , but if we merge label and payload inside a Tuple2, then it should also be= Gelly-compatible :)

Cheers,
Vasia.

On 6 October 2015 at 10:0= 3, Martin Junghanns <m.junghanns@mailbox.org> wrote:
Hi all,

For our benchmarks with Flink, we are using a data generator provided by th= e LDBC project (Linked Data Benchmark Council) [1][2]. The generator uses M= apReduce to create directed, labeled, attributed graphs that mimic properti= es of real online social networks (e.g, degree distribution, diameter). The= output is stored in several files either local or in HDFS. Each file repre= sents a vertex, edge or multi-valued property class.

I wrote a little tool, that parses and transforms the LDBC output into two = datasets representing vertices and edges. Each vertex has a unique id, a la= bel and payload according to the LDBC schema. Each edge has a unique id, a = label, source and target vertex IDs and also payload according to the schem= a.

I thought this may be useful for others so I put it on GitHub [2]. It curre= ntly uses Flink 0.10-SNAPSHOT as it depends on some fixes made in there.
Best,
Martin

[1] http://ldbcouncil.org/
[2] https://github.com/ldbc/ldbc_snb_datagen
[3] https://github.com/s1ck/ldbc-flink-import

--047d7bd907bcf1d7da05216bc157--