giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <>
Subject [jira] [Commented] (GIRAPH-359) Parallelize the input (loading) / output (storing)
Date Mon, 08 Oct 2012 22:10:02 GMT


Eli Reisman commented on GIRAPH-359:

By write speeds I take it to mean when you're done with the supersteps and you're writing
the final output to HDFS? Folks who use Giraph with more workers per machine seem to report
this issue; for those of us who have logged more time in "less workers per machine" setups,
this has never been a particularly long section of a job run. Could this have something to
do with it (too many workers per machine vying to do IO at once, etc.) or is there a more
general reason this is happening? Do you have an idea how you want to approach fixing this
problem yet?

When I sat it wasn't a problem with my setup, I will say the write stage was about as long
as your average superstep took during the calculation stages. That just wasn't very long for
that setup/amount of input data. I know in other hardware setups or input data amounts the
supersteps for calc were longer too -- would you still say the write takes as long as your
average supersteps, or much longer? shorter? Just curious.

> Parallelize the input (loading) / output (storing)
> --------------------------------------------------
>                 Key: GIRAPH-359
>                 URL:
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Avery Ching
>            Assignee: Avery Ching
> Often we find that our write rates aren't great.  This could likely be improved by parallelizing
the input/output with multi-threading.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message