giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <>
Subject Re: Giraph input format restrictions
Date Mon, 20 Feb 2012 17:35:32 GMT
You can implement your own way of loading/storing your graph from HDFS
but I'd suggest to put all the infos about a vertex on the same line,
it makes it just more simple to manage for you and probably it's only
the most feasible way. I guess it would be more messy if your edges
wound start being spread across different splits. If you check out the
examples you'll find a json-based serialization format. I'd suggest
you try to use something like that so you can also re-use most of the
input outputformats working with those examples. Be ware, that code
isn't the most efficient when you have big adjacency lists due to
Text's memory overhead. I'd suggest you use an online-streaming api
like Jackson and avoid passing from Text to String (jackson should
work from byte[]).

Hope this helps,

On Sun, Feb 19, 2012 at 8:25 PM, yavuz gokirmak <> wrote:
> Hi,
> In Shortest Paths Example it is written that "Currently there is one
> restriction on the VertexInputFormat that is not obvious. The vertices must
> be sorted.". I didn't understand the reason of this restriction, why
> vertices should be ordered?
> Secondly, as I understood, we have to transform our initial data into a form
> that each line corresponds to a vertex(with edge and values if exists) in
> the graph.
> For example, I have a data that each row is corresponds to an edge between
> to vertices
> format1:
> a b
> a c
> a d
> b c
> b a
> c d
> Do I have to convert this file into a format similar to below in order to
> use with giraph algorithms?
> format2:
> a b c d
> b c a
> c d
> thanks..

   Claudio Martella

View raw message