giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <>
Subject [jira] [Commented] (GIRAPH-477) Fetching locality info in InputSplitPathOrganizer causes jobs to hang
Date Fri, 11 Jan 2013 01:35:11 GMT


Eli Reisman commented on GIRAPH-477:

I agree about the strategy change, exploring having the master disseminate this information
might be the right thing. There are a couple of things we could do that we should talk about,
I had a couple ideas that worked when I was originally doing this but did not encapsulate
into the input split organizer so well. Using the master IPC or storing in ZK split node's
titles rather than data is very effective, but have other down sides. Worth exploring though.

> Fetching locality info in InputSplitPathOrganizer causes jobs to hang
> ---------------------------------------------------------------------
>                 Key: GIRAPH-477
>                 URL:
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Assignee: Alessandro Presta
>         Attachments: GIRAPH-477.patch
> In the presence of many input splits (>6000 in our case) and input split threads (3000),
the loop that fetches locality info for all splits from ZooKeeper becomes a bottleneck. A
few workers aren't able to even iterate once over the list, run into increased GC pauses,
and eventually time out.
> Furthermore, depending on the cluster configuration, it's not always possible/useful
to exploit locality.
> We should add a flag so that the feature can be optionally disabled.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message