You can go through the code for each of these options to see what they do.
Using this you can sort of pre-assign workers to vertex ids, now if u assign two vertices to a worker say worker-1, it is guaranteed they are both present in the same worker, the numbering (aka identification/naming) of workers is consistent (i.e, if a, b are assigned worker-x, they are guaranteed to be in the same worker but we do not know which worker that would be ahead in time), but cannot be explicitly set by the user. (which is what you want to do from what I can tell)
If you are using something else, other than hive then you will have to implement all the interfaces of MappingInputFormat and then u can easily achieve what you want.
From: firstname.lastname@example.org To: email@example.com Subject: Best way to know the assignment of vertices to workers Date: Fri, 28 Nov 2014 12:02:59 +0000
Is there a clean way to find out which worker a particular vertex is assigned to?
From what I tried out, I found that given n workers, each node is assigned to the worker with id (vertex_id % n ). Is that a safe way to do this?
Ive had a look at previous discussions, but most of them have no answer.
Why I need it:
In my application, each vertex needs to know some additional meta data, which is loaded from file. This metadata file is huge (>50 G) and so, on each worker, I only want to load the metadata corresponding to the vertices present on that worker.