giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <>
Subject [jira] [Created] (GIRAPH-308) Giraph consistently creates 10% more InputSplits than one would expect
Date Sun, 19 Aug 2012 18:27:37 GMT
Eli Reisman created GIRAPH-308:

             Summary: Giraph consistently creates 10% more InputSplits than one would expect
                 Key: GIRAPH-308
             Project: Giraph
          Issue Type: Bug
          Components: graph
    Affects Versions: 0.2.0
            Reporter: Eli Reisman
            Priority: Minor
             Fix For: 0.2.0

As I have been doing a lot of instrumented runs for scale out, and to test 246 and 301 (among
other patches) I have seen the the calculation:

(# of MB in input files) / (giraph.splitmb setting) == # of InputSplits to expect

is not arriving at the number of splits one would expect. I would think there would be an
extra now and then to round off fractional amounts in a calculation such as the one stated
above, but I'm consistently seeing more than that, roughly 10% more than one would expect
and this is consistent over runs with many different size data loads. 

If there is some simple explanation, perhaps I'll find it in the code but either way I wanted
to post a JIRA because this is somewhat counterintuitive and suggests we should alter the
behavior of giraph.splitmb to ensure users get what they expect in terms of input splits.
In memory scarcity use cases, I am finding that if a given worker reads just one split too
many on a given data load, it will overload and fail. Knowing how many workers to allocate
for a given data load with some precision has been the key to scale out under scarce resources
here. Seeing these numbers now as I test 301 (which is meant to help ensure the split-reading
load is spread out evenly among workers) I see this has fooled me at times in the past when
setting -w and -Dgiraph.splitmb options carefully.

At the very least, it would be nice to hear from someone that knows whats going on here what
the deal is so there is a definitive posting on this matter that folks can refer to for information
in the future when exploring a use case like mine. Many users here will be in the same boat
as me, of course :)

Thanks in advance.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message