giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: sorted output
Date Fri, 22 Mar 2013 04:24:43 GMT
This is not super easy to do, but possible.

1)  You would probably need to use the range partitioner to partition 
the vertices (or else they will be interspersed across partitions).
2)  You would be to add a partition store implementation that kept the 
vertices sorted (i.e. TreeMap)

Alternatively, you can write a simple map-reduce job to do the sort of 
course.

On 3/21/13 5:58 PM, Ameet Kini wrote:
> Is it possible to save the final output sorted by vertex id? My
> vertices have their id of type long, and I am using
> SequenceFileOutputFormat, where the key of the sequence file is the
> vertex id of type long. If the vertices were somehow written in sorted
> order, I could even switch to using Hadoop's MapFileOutputFormat,
> which expects sorted keys. I understand that if there are multiple
> workers, there won't be a total order on the keys, and that's fine. As
> long as each worker writes its output sorted by vertex id.
>
> I was looking at the code and looks like the call to writeVertex is
> made in BspServiceWorker.saveVertices, Looks like there is no way to
> control the order of vertices, but I may be missing something. Any
> pointers or examples would help.
>
> Thanks,
> Ameet


Mime
View raw message