incubator-giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Boyd <joseph.b...@cbsinteractive.com>
Subject Re: Input And Partitioning
Date Thu, 08 Sep 2011 18:27:24 GMT
On Thu, Sep 8, 2011 at 11:07 AM, Severin Andreas Corsten
<severin.corsten@de.ibm.com> wrote:

> 1: Am I right in the assumption that Giraph does not split the input file by itself.
Assume that I have got a graph in one single file,
> Giraph sends the whole graph to one worker while the rest of the workers is just idle.

Giraph uses the number of InputSplits returned by your
VertexInputFormat.getSplits() implementation.

For VertextInputFormats that wrap Hadoop TextInputFormat, what you've
said will be true, and graph input in one, small file, will all be
sent to one worker.  As a cheap work-around for this, we've the
FileInputFormat split size arbitrarily small :
          FileInputFormat.setMaxInputSplitSize(bspJob, 1048576); //
number of bytes in one meg

Additionally, Giraph has re-balancing features that can give work to
under-used workers in subsequent supersteps, but I haven't played with
them.  (I'm not sure they would have helped me anyway, as my graph,
even though it had a small input file, wouldn't fit into memory on one
worker).


> 2: I read through the source code and found a part saying that vertices must be presented
in id-order. Is that a task the user has
> to do or is there a workaround to have vertices not in id-order?

Sorting the input into Id-order is for the user to do.  There are open
JIRAs, like GIRAPH-11 [1] to improve the situation here.


> 3: The VertexRange class provides the assignment between vertices and workers. Is there
 a way to override the
> standard implementation and use a custom assignment system?

I have no idea, but the work in GIRAPH-11 will probably give a clue
what's involved.



...joe


[1]  https://issues.apache.org/jira/browse/GIRAPH-11





> Thanks in advance.
>
> Kind regards / Mit freundlichen Grüßen
>
> Severin Andreas Corsten
> DHBW-Student Business Informatics 2009 - University Programs
> IBM Sales & Distribution, Human Resources
> WI09N-M
> ________________________________
> Phone: 1-408-927-2750
> Mobile (Germany): 49-160-98976935
> E-mail: severin.corsten@de.ibm.com
>
> Hechtsheimer Str. 2
> Mainz, 55131
> Germany

Mime
View raw message