incubator-giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Severin Andreas Corsten <>
Subject Re: Input And Partitioning
Date Fri, 09 Sep 2011 03:08:16 GMT
Thank you very much, you helped me a lot. 

Kind regards / Mit freundlichen Grüßen

Severin Andreas Corsten
DHBW-Student Business Informatics 2009 - University Programs
IBM Sales & Distribution, Human Resources

Phone: 1-408-927-2750 

Hechtsheimer Str. 2
Mainz, 55131

IBM Deutschland Management & Business Support GmbH / Vorsitzender des 
Aufsichtsrats: Martin Jetter
Geschäftsführung: Martina Koederitz (Vorsitzende), Reinhard Reschke, 
Dieter Scholz, Gregor Pillen, Joachim Heel, Christian Noll
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
HRB 24938

From:   Avery Ching <>
Cc:     Joseph Boyd <>
Date:   08.09.2011 13:29
Subject:        Re: Input And Partitioning

Great answers and suggestion Joe.  I just wanted to inline a few 
comments to what Joe wrote.

On 9/8/11 11:27 AM, Joseph Boyd wrote:
> On Thu, Sep 8, 2011 at 11:07 AM, Severin Andreas Corsten
> <>  wrote:
>> 1: Am I right in the assumption that Giraph does not split the input 
file by itself. Assume that I have got a graph in one single file,
>> Giraph sends the whole graph to one worker while the rest of the 
workers is just idle.
> Giraph uses the number of InputSplits returned by your
> VertexInputFormat.getSplits() implementation.
> For VertextInputFormats that wrap Hadoop TextInputFormat, what you've
> said will be true, and graph input in one, small file, will all be
> sent to one worker.  As a cheap work-around for this, we've the
> FileInputFormat split size arbitrarily small :
>            FileInputFormat.setMaxInputSplitSize(bspJob, 1048576); //
> number of bytes in one meg
> Additionally, Giraph has re-balancing features that can give work to
> under-used workers in subsequent supersteps, but I haven't played with
> them.  (I'm not sure they would have helped me anyway, as my graph,
> even though it had a small input file, wouldn't fit into memory on one
> worker).
>> 2: I read through the source code and found a part saying that vertices 
must be presented in id-order. Is that a task the user has
>> to do or is there a workaround to have vertices not in id-order?
> Sorting the input into Id-order is for the user to do.  There are open
> JIRAs, like GIRAPH-11 [1] to improve the situation here.
>> 3: The VertexRange class provides the assignment between vertices and 
workers. Is there  a way to override the
>> standard implementation and use a custom assignment system?
> I have no idea, but the work in GIRAPH-11 will probably give a clue
> what's involved.
GIRAPH-11 will change a lot about the way vertices are assigned.  There 
will be an option for hashing, hash ranges, or user-defined ranges. 
There is also a way to control the assignment of vertex ranges to at 
some level right now (this will likely change a bit as well after 

In GiraphJob, there is a method

      * Set the vertex range balancer class (optional)
      * @param vertexRangeBalancerClass Determines how vertex
      *        ranges are balanced prior to each superstep
     final public void setVertexRangeBalancerClass(
             Class<?> vertexRangeBalancerClass) {

By default, we use the StaticBalancer, it doesn't move vertices at all. 
There is also an AutoBalancer that tries to balance the graph based on 
vertices or edges.  You can also write you own.  Hope that helps.
> ...joe
> [1]
>> Thanks in advance.
>> Kind regards / Mit freundlichen Grüßen
>> Severin Andreas Corsten
>> DHBW-Student Business Informatics 2009 - University Programs
>> IBM Sales&  Distribution, Human Resources
>> WI09N-M
>> ________________________________
>> Phone: 1-408-927-2750
>> Mobile (Germany): 49-160-98976935
>> E-mail:
>> Hechtsheimer Str. 2
>> Mainz, 55131
>> Germany

View raw message