giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: Giraph 1.0.0 - Netty port allocation
Date Fri, 22 Nov 2013 19:23:46 GMT
The reason is actually simple.  If you run more than one Giraph worker 
per machine, there will be a port conflict.  Worse yet, imagine multiple 
Giraph jobs running simultaneously running on a cluster, hence we have 
the increase port strategy.  It would be straightforward to add a 
configurable option to use a single port though for situations such as 
yours though (especially since you know where the code is now).

Avery

On 11/22/13 11:19 AM, Larry Compton wrote:
> Avery,
>
> It looks like the ports are being allocated the way we suspected 
> (30000 + task ID). That's a problem for us because we'll have to open 
> a wide bank of ports (the SAs want to minimize open ports) and also 
> keep them available for use by Giraph. Ideally, the port allocation 
> would take the host into consideration. If you ask for 200 workers and 
> they're each running on a different host, port 30000 could be used by 
> every Netty server. The way it's working now, a different port is 
> being allocated per worker, which appears unnecessary. Is there a 
> reason a different port is used per worker/task?
>
> Is this still the way ports are allocated in Giraph 1.1.0?
>
> Larry
>
>
> On Fri, Nov 22, 2013 at 1:18 PM, Avery Ching <aching@apache.org 
> <mailto:aching@apache.org>> wrote:
>
>     The port logic is a bit complex, but all encapsulated in
>     NettyServer.java (see below).
>
>     If nothing else is running on those ports and you really only have
>     one giraph worker per port you should be good to go.  Can you look
>     at the logs for the worker that is trying to start a port other
>     than base port + taskId?
>
>
>         int taskId = conf.getTaskPartition();
>         int numTasks = conf.getInt("mapred.map.tasks", 1);
>         // Number of workers + 1 for master
>         int numServers = conf.getInt(GiraphConstants.MAX_WORKERS,
>     numTasks) + 1;
>         int portIncrementConstant =
>             (int) Math.pow(10, Math.ceil(Math.log10(numServers)));
>         int bindPort = GiraphConstants.IPC_INITIAL_PORT.get(conf) +
>     taskId;
>         int bindAttempts = 0;
>         final int maxIpcPortBindAttempts =
>     MAX_IPC_PORT_BIND_ATTEMPTS.get(conf);
>         final boolean failFirstPortBindingAttempt =
>     GiraphConstants.FAIL_FIRST_IPC_PORT_BIND_ATTEMPT.get(conf);
>
>         // Simple handling of port collisions on the same machine while
>         // preserving debugability from the port number alone.
>         // Round up the max number of workers to the next power of 10
>     and use
>         // it as a constant to increase the port number with.
>         while (bindAttempts < maxIpcPortBindAttempts) {
>           this.myAddress = new InetSocketAddress(localHostname, bindPort);
>           if (failFirstPortBindingAttempt && bindAttempts == 0) {
>             if (LOG.isInfoEnabled()) {
>               LOG.info("start: Intentionally fail first " +
>                   "binding attempt as
>     giraph.failFirstIpcPortBindAttempt " +
>                   "is true, port " + bindPort);
>             }
>             ++bindAttempts;
>             bindPort += portIncrementConstant;
>             continue;
>           }
>
>           try {
>             Channel ch = bootstrap.bind(myAddress);
>             accepted.add(ch);
>
>             break;
>           } catch (ChannelException e) {
>             LOG.warn("start: Likely failed to bind on attempt " +
>                 bindAttempts + " to port " + bindPort, e);
>             ++bindAttempts;
>             bindPort += portIncrementConstant;
>           }
>         }
>         if (bindAttempts == maxIpcPortBindAttempts || myAddress == null) {
>           throw new IllegalStateException(
>               "start: Failed to start NettyServer with " +
>                   bindAttempts + " attempts");
>
>         }
>
>
>
>     On 11/22/13 9:15 AM, Larry Compton wrote:
>
>         My teammates and I are running Giraph on a cluster where a
>         firewall is configured on each compute node. We had 100 ports
>         opened on the compute nodes, which we thought would be more
>         than enough to accommodate a large number of workers. However,
>         we're unable to go beyond about 90 workers with our Giraph
>         jobs, due to Netty ports being allocated outside of the range
>         (30000-30100). We're not sure why this is happening. We
>         shouldn't be running more than one worker per compute node, so
>         we were assuming that only port 30000 would be used, but we're
>         routinely seeing Giraph try to use ports greater than 30100
>         when we request close to 100 workers. This leads us to believe
>         that a simple one up numbering scheme is being used that
>         doesn't take the host into consideration, although this is
>         only speculation.
>
>         Is there a way around this problem? Our system admins
>         understandably balked at opening 1000 ports.
>
>         Larry
>
>
>
>


Mime
View raw message