flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Junghanns <m.jungha...@mailbox.org>
Subject LDBC Graph Data into Flink
Date Tue, 06 Oct 2015 08:03:12 GMT
Hi all,

For our benchmarks with Flink, we are using a data generator provided by 
the LDBC project (Linked Data Benchmark Council) [1][2]. The generator 
uses MapReduce to create directed, labeled, attributed graphs that mimic 
properties of real online social networks (e.g, degree distribution, 
diameter). The output is stored in several files either local or in 
HDFS. Each file represents a vertex, edge or multi-valued property class.

I wrote a little tool, that parses and transforms the LDBC output into 
two datasets representing vertices and edges. Each vertex has a unique 
id, a label and payload according to the LDBC schema. Each edge has a 
unique id, a label, source and target vertex IDs and also payload 
according to the schema.

I thought this may be useful for others so I put it on GitHub [2]. It 
currently uses Flink 0.10-SNAPSHOT as it depends on some fixes made in 
there.

Best,
Martin

[1] http://ldbcouncil.org/
[2] https://github.com/ldbc/ldbc_snb_datagen
[3] https://github.com/s1ck/ldbc-flink-import

Mime
View raw message