giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pavan Kumar A <>
Subject RE: Best way to know the assignment of vertices to workers
Date Sat, 29 Nov 2014 05:53:39 GMT
I wrote a diff sometime ago where you can easily do that. 
You can find implementation details at -
Some options you can use are    -Dgiraph.mappingStoreClass=org.apache.giraph.mapping.LongByteMappingStore
   -Dgiraph.lbMappingStoreUpper=1987000    -Dgiraph.lbMappingStoreLower=4096    # Mapping
tore ops information    -Dgiraph.mappingStoreOpsClass=org.apache.giraph.mapping.DefaultEmbeddedLongByteOps
   # Embed mapping information    -Dgiraph.edgeTranslationClass=org.apache.giraph.mapping.translate.LongByteTranslateEdge
   # PartitionerFactory to be used    -Dgiraph.graphPartitionerFactoryClass=org.apache.giraph.partition.LongMappingStorePartitionerFactory
And like vertex input & edge input we now have a mapping inputI only implemented all these
for giraph-hive, so if u have a hive table with the mapping vertexId -> workerNumthen u
can pass the mapping input like
"org.apache.giraph.hive.input.mapping.examples.LongInt2ByteHiveToMapping, $mapping_table,
You can go through the code for each of these options to see what they do. 
Using this you can sort of pre-assign workers to vertex ids, now if u assign two vertices
to a worker say worker-1, it is guaranteed they are both present in the same worker, the numbering
(aka identification/naming) of workers is consistent (i.e, if a, b are assigned worker-x,
they are guaranteed to be in the same worker but we do not know which worker that would be
ahead in time), but cannot be explicitly set by the user. (which is what you want to do from
what I can tell)
If you are using something else, other than hive then you will have to implement all the interfaces
of MappingInputFormat and then u can easily achieve what you want.
Subject: Best way to know the assignment of vertices to workers
Date: Fri, 28 Nov 2014 12:02:59 +0000

Hi all,

Is there a clean way to find out which worker a particular vertex is assigned to?

>From what I tried out, I found that given n workers, each node is assigned to the worker
with id (vertex_id % n  ). Is that a safe way to do this?

I’ve had a look at previous discussions, but most of them have no answer.


Why I need it:

In my application, each vertex needs to know some additional meta data, which is loaded from
file. This metadata file is huge (>50 G) and so, on each worker, I only want to load the
metadata corresponding to the vertices present on that worker.


Previous discussions:

View raw message