giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikant Dindokar <>
Subject Re: Request for information on Giraph custom Partitioner using external service
Date Wed, 11 Jul 2018 14:02:20 GMT
Hi Neha,

Let us assume that you are using some partition tool which gives output in
the following format


Now you need to write code for few classes to get the job done

# Create a vertex class which implements writable interface and can store
vertex id and partition id

public class MyVertex implements WritableComparable {

private short partition;
private long id;

public MyVertex() {
//  get the partition from tokens[1] and the id from tokens[0] in the
constructor by specifying the delimeter
public MyVertex(String id) {
String[] tokens = id.split(DELIMITER);
this.partition = Short.parseShort(tokens[1]); = Long.parseLong(tokens[0]);
// you have to override the rest of the methods
specify this class by -vif option while submitting the application

# Implement GraphPartitionerFactory<I, V, E> such that it will invoke your
custom worker partitioner
public WorkerGraphPartitioner<I, V, E> createWorkerGraphPartitioner() {
return new MyWorkerPartitioner<I, V, E>();

specify this class  in -ca giraph.graphPartitionerFactoryClass while
submitting application

#Provide implementation for the workerPartitioner something like this

public class MyWorkerPartitioner<I extends WritableComparable, V extends
Writable, E extends Writable>
extends HashWorkerPartitioner<I, V, E> {

public PartitionOwner getPartitionOwner(I vertexId) {
             //write logic such that this method returns the desired
partition id.

Hope this helps!


On Tue, Jul 10, 2018 at 6:53 PM Neha Raj <> wrote:

> Hi,
> I am working on a Graph Partitioning algorithms, and have chosen Giraph as
> a Graph processing system to run Graph problems, and very new to both.I
> would like to provide external partitioning information(in the form of txt
> file) to Giraph. For this I have created a custom partition (something like
> HashPartitionFactory), which reads the external file for graph partition Id.
> While debugg I realize that this parition logic is invoked several times
> (during the Giraph supersteps) ,and reading the same external file multiple
> times is not time efficient. To handle this I wish to create a
> global(across distributed system) Map variable which holds {vertex Id ,
> partition Id} as a key value pair, and I want to populate this variable
> from external file one time during a Giraph job run. I have tried several
> ways to create & intialize such a global variable but the fact that global
> variable will be populated for a Giraph job is very non deterministic (i.e
> sometime the map is populated with value, sometimes not).
> I think there might be some issue in how I am creating the Map variable
> and initializing it to be invoked before My custom Partitioning logic calls
> it. Can somebody please guide me the correct place to plugin this piece of
> information to a Giraph job; and possibly a correct way of creating a
> global variable with respect to Giraph distributed processing
> Thanks & Regards,
> Neha

View raw message