giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikant Dindokar <ravikant.i...@gmail.com>
Subject Re: Request for information on Giraph custom Partitioner using external service
Date Wed, 11 Jul 2018 14:02:20 GMT
Hi Neha,

Let us assume that you are using some partition tool which gives output in
the following format

vertexId<delimiter>partitionId

Now you need to write code for few classes to get the job done

# Create a vertex class which implements writable interface and can store
vertex id and partition id

e.g.
public class MyVertex implements WritableComparable {

private short partition;
private long id;

public MyVertex() {
}
//  get the partition from tokens[1] and the id from tokens[0] in the
constructor by specifying the delimeter
public MyVertex(String id) {
String[] tokens = id.split(DELIMITER);
this.partition = Short.parseShort(tokens[1]);
this.id = Long.parseLong(tokens[0]);
}
// you have to override the rest of the methods
specify this class by -vif option while submitting the application

# Implement GraphPartitionerFactory<I, V, E> such that it will invoke your
custom worker partitioner
e.g.
@Override
public WorkerGraphPartitioner<I, V, E> createWorkerGraphPartitioner() {
return new MyWorkerPartitioner<I, V, E>();
}

specify this class  in -ca giraph.graphPartitionerFactoryClass while
submitting application

#Provide implementation for the workerPartitioner something like this

public class MyWorkerPartitioner<I extends WritableComparable, V extends
Writable, E extends Writable>
extends HashWorkerPartitioner<I, V, E> {

@Override
public PartitionOwner getPartitionOwner(I vertexId) {
             //write logic such that this method returns the desired
partition id.
}
}


Hope this helps!

Thanks
Ravikant


On Tue, Jul 10, 2018 at 6:53 PM Neha Raj <neharaj.06@gmail.com> wrote:

> Hi,
>
> I am working on a Graph Partitioning algorithms, and have chosen Giraph as
> a Graph processing system to run Graph problems, and very new to both.I
> would like to provide external partitioning information(in the form of txt
> file) to Giraph. For this I have created a custom partition (something like
> HashPartitionFactory), which reads the external file for graph partition Id.
>
> While debugg I realize that this parition logic is invoked several times
> (during the Giraph supersteps) ,and reading the same external file multiple
> times is not time efficient. To handle this I wish to create a
> global(across distributed system) Map variable which holds {vertex Id ,
> partition Id} as a key value pair, and I want to populate this variable
> from external file one time during a Giraph job run. I have tried several
> ways to create & intialize such a global variable but the fact that global
> variable will be populated for a Giraph job is very non deterministic (i.e
> sometime the map is populated with value, sometimes not).
>
> I think there might be some issue in how I am creating the Map variable
> and initializing it to be invoked before My custom Partitioning logic calls
> it. Can somebody please guide me the correct place to plugin this piece of
> information to a Giraph job; and possibly a correct way of creating a
> global variable with respect to Giraph distributed processing
>
> Thanks & Regards,
> Neha
>

Mime
View raw message