giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <>
Subject Re: question about vertex instantiation location. . .
Date Fri, 10 Feb 2012 22:04:48 GMT
Even if you start with two vertices, the number of partitions is based 
on the number of workers squared multiplied by a multiplier (see 
HashMasterPartitioner#PARTITION_COUNT_MULTIPLIER).  By default, the 
multiplier  is 1, so if you have say 10 workers, you'll have 100 
partitions.  There is a maximum number of partitions though due to the 
max zknode size of about 2995.  So everything should be fine for you.


On 2/10/12 1:52 PM, David Garcia wrote:
> Ah, so, I think I would like to balance by vertices.  My main question is
> that my graph starts with two vertices. . .I would like to specify more
> than two mappers.  My job will end up creating around 100,000 vertices.  I
> would like to make sure that these extra vertices will be evenly
> distributed across all mappers (including the ones that don't have the
> initial two vertices).  Does this make sense?  Does Giraph support this
> out of the box, or do I need to add something?  Thx.
> -David
> On 2/10/12 3:41 PM, "Avery Ching"<>  wrote:
>> By default, you are using the HashPartitionerFactory.  This will create
>> the partitions ahead of time and balance them equally by count to the
>> workers.  Therefore, assuming you have a uniform distribution across the
>> VertexId space, the graph should be balanced across the workers evenly
>> according the number of vertices.  If you look at PartitionBalancer, you
>> can try to rebalance the graph if you like as it is running.  This is a
>> bit experimental, but should work.  The choices for balancing are (no
>> balancing, balance by edges or balance by vertices).
>> Hope that helps,
>> Avery
>> On 2/10/12 1:25 PM, David Garcia wrote:
>>> Hey guys. . .I have a questions about "dynamic" vertex instantiation vis
>>> the sendMsg(. . .) method.  I have a job that starts processing on a
>>> sequenceFile with only two vertices in it.  Each vertex has information
>>> in
>>> it's value that tells it what vertices are adjacent to it.  The primary
>>> reason I'm doing this is to avoid loading the entire graph into the job.
>>> There are many vertices that won't do any processing (no need to load
>>> them).  I would like to take my two vertices and "dynamically" build the
>>> graph by sending messages.  So far, my experimentation shows that this
>>> is
>>> promising. . .but I have a question WRT load balancing for new vertex
>>> instantiation.  When I call sendMsg(newVertexID), where will the vertex
>>> be
>>> instantiated?  If I specify 20 mappers (but with only two vertices in my
>>> sequence file), obviously there is going to be at least one mapper
>>> without
>>> a vertex.  Is it possible that sendMsg(newVertexID) will be instantiated
>>> on an empty mapper?  I would like this. . .for load balancing purposes.
>>> -david

View raw message