giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: Input And Partitioning
Date Thu, 08 Sep 2011 20:29:53 GMT
Great answers and suggestion Joe.  I just wanted to inline a few 
comments to what Joe wrote.

On 9/8/11 11:27 AM, Joseph Boyd wrote:
> On Thu, Sep 8, 2011 at 11:07 AM, Severin Andreas Corsten
> <severin.corsten@de.ibm.com>  wrote:
>
>> 1: Am I right in the assumption that Giraph does not split the input file by itself.
Assume that I have got a graph in one single file,
>> Giraph sends the whole graph to one worker while the rest of the workers is just
idle.
> Giraph uses the number of InputSplits returned by your
> VertexInputFormat.getSplits() implementation.
>
> For VertextInputFormats that wrap Hadoop TextInputFormat, what you've
> said will be true, and graph input in one, small file, will all be
> sent to one worker.  As a cheap work-around for this, we've the
> FileInputFormat split size arbitrarily small :
>            FileInputFormat.setMaxInputSplitSize(bspJob, 1048576); //
> number of bytes in one meg
>
> Additionally, Giraph has re-balancing features that can give work to
> under-used workers in subsequent supersteps, but I haven't played with
> them.  (I'm not sure they would have helped me anyway, as my graph,
> even though it had a small input file, wouldn't fit into memory on one
> worker).
>
>
>> 2: I read through the source code and found a part saying that vertices must be presented
in id-order. Is that a task the user has
>> to do or is there a workaround to have vertices not in id-order?
> Sorting the input into Id-order is for the user to do.  There are open
> JIRAs, like GIRAPH-11 [1] to improve the situation here.
>
>
>> 3: The VertexRange class provides the assignment between vertices and workers. Is
there  a way to override the
>> standard implementation and use a custom assignment system?
> I have no idea, but the work in GIRAPH-11 will probably give a clue
> what's involved.
GIRAPH-11 will change a lot about the way vertices are assigned.  There 
will be an option for hashing, hash ranges, or user-defined ranges.  
There is also a way to control the assignment of vertex ranges to at 
some level right now (this will likely change a bit as well after 
GIRAPH-11).

In GiraphJob, there is a method

     /**
      * Set the vertex range balancer class (optional)
      *
      * @param vertexRangeBalancerClass Determines how vertex
      *        ranges are balanced prior to each superstep
      */
     final public void setVertexRangeBalancerClass(
             Class<?> vertexRangeBalancerClass) {
         getConfiguration().setClass(VERTEX_RANGE_BALANCER_CLASS,
                                     vertexRangeBalancerClass,
                                     VertexRangeBalancer.class);
     }

By default, we use the StaticBalancer, it doesn't move vertices at all.  
There is also an AutoBalancer that tries to balance the graph based on 
vertices or edges.  You can also write you own.  Hope that helps.
>
>
> ...joe
>
>
> [1]  https://issues.apache.org/jira/browse/GIRAPH-11
>
>
>
>
>
>> Thanks in advance.
>>
>> Kind regards / Mit freundlichen Grüßen
>>
>> Severin Andreas Corsten
>> DHBW-Student Business Informatics 2009 - University Programs
>> IBM Sales&  Distribution, Human Resources
>> WI09N-M
>> ________________________________
>> Phone: 1-408-927-2750
>> Mobile (Germany): 49-160-98976935
>> E-mail: severin.corsten@de.ibm.com
>>
>> Hechtsheimer Str. 2
>> Mainz, 55131
>> Germany


Mime
View raw message