flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasiliki Kalavri <vasilikikala...@gmail.com>
Subject Re: store and retrieve Graph object
Date Wed, 25 Nov 2015 21:39:51 GMT
Good to know :)

On 25 November 2015 at 21:44, Stefanos Antaris <antaris.stefanos@gmail.com>
wrote:

> Hi,
>
> It works fine using this approach.
>
> Thanks,
> Stefanos
>
> On 25 Nov 2015, at 20:32, Vasiliki Kalavri <vasilikikalavri@gmail.com>
> wrote:
>
> Hey,
>
> you can preprocess your data, create the vertices and store them to a
> file, like you would store any other Flink DataSet, e.g. with writeAsText.
>
> Then, you can create the graph by reading 2 datasets, like this:
>
> DataSet<Vertex> vertices = env.readTextFile("/path/to/vertices/")... // or
> your custom reading logic
> DataSet<Edge> edges = ...
>
> Graph graph = Graph.fromDataSet(vertices, edges, env);
>
> Is this what you're looking for?
>
> Also, note that if you have a very large graph, you should avoid using
> collect() and fromCollection().
>
> -Vasia.
>
> On 25 November 2015 at 18:03, Stefanos Antaris <antaris.stefanos@gmail.com
> > wrote:
>
>> Hi Vasia,
>>
>> my graph object is the following:
>>
>> Graph<MyPojoNode, NullValue, Integer> graph = Graph.fromCollection(
>> edgeList.collect(), env);
>>
>> The vertex is a POJO not the value. So the problem is how could i store
>> and retrieve the vertex list?
>>
>> Thanks,
>> Stefanos
>>
>> On 25 Nov 2015, at 18:16, Vasiliki Kalavri <vasilikikalavri@gmail.com>
>> wrote:
>>
>> Hi Stefane,
>>
>> let me know if I understand the problem correctly. The vertex values are
>> POJOs that you're somehow inferring from the edge list and this value
>> creation is what takes a lot of time? Since a graph is just a set of 2
>> datasets (vertices and edges), you could store the values to disk and have
>> a custom input format to read them into datasets. Would that work for you?
>>
>> -Vasia.
>>
>> On 25 November 2015 at 15:09, Stefanos Antaris <
>> antaris.stefanos@gmail.com> wrote:
>>
>>> Hi to all,
>>>
>>> i am working on a project with Gelly and i need to create a graph with
>>> billions of nodes. Although i have the edge list, the node in the Graph
>>> needs to be a POJO object, the construction of which takes long time in
>>> order to finally create the final graph. Is it possible to store the Graph
>>> object as a file and retrieve it whenever i want to run an experiment?
>>>
>>> Thanks,
>>> Stefanos
>>
>>
>>
>>
>
>

Mime
View raw message