incubator-giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avery Ching (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried
Date Sat, 28 Jan 2012 00:34:36 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195286#comment-13195286
] 

Avery Ching commented on GIRAPH-128:
------------------------------------

Thanks for taking a look.  I forgot to upload the original (rb only for that one), hence part
2. 

The main motivation for the obscure case is that it would make debugging simpler.  We often
see errors like serverX:portY, and can use portY to figure out which mapper to look at.  For
example, currently the default starts at 30000.  If I see an error from 30001, then I know
to go to mapper 1 to see it's problem.  And so on and so forth.  If I am running a 900 mapper
job then if it's 31001 or 32001 then I still know to look at mapper partition 1.  If instead
I had a 100 as the constant, then if it's 30101, I have to check both mapper 1 and mapper
101.  With up to 20 retries per port, we can handle at least 20 simultaneous jobs running
on a single machine that have the same mapper partition id.  First of, that is probably unlikely.
 But even if it does happen, 20 is probably more than an one machine would handle.  By the
way, port retries are very fast (so I wouldn't worry to much about collisions).

Let me resubmit without the whitespace changes and making MAX_BIND_ATTEMPTS configurable.
                
> RPC port from BasicRPCCommunications should be only a starting port, and retried
> --------------------------------------------------------------------------------
>
>                 Key: GIRAPH-128
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-128
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 0.1.0
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-128.2.patch
>
>
> Currently Giraph uses a basic port + the task partition to get the RPC port.  This doesn't
work well for when there are multiple Giraph jobs running simultaneously in the same Hadoop
cluster (port conflict).  At the same time, it is nice to use this simple algorithm because
it makes it very easy to debug problems (you can find the troublesome mapper from the RPC
port name).  I will be proposing a simple scheme to retry with another port.  I will round
the total number of mappers up to the nearest power of 10 (let's that that number Z).  Then
I will increment the port number by Z, retrying up to 20 tries.  If you have enough ports,
this scheme would guarantee that up to 20 mappers / node would be supported.  It should be
sufficient for most clusters.  At the same time, we still maintain the easy debugging method
since you it's still easy to figure out the mapper partition from the port (port % Z = map
partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message