hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Lee <>
Subject Re: Hard Coded 0 to assign RPC Server port number when hive.execution.engine=spark
Date Tue, 20 Oct 2015 04:12:04 GMT
Hi Xuefu,

I agree for HS2 since HS2 usually runs on a gateway or service node inside the cluster environment.
In my case, it is actually additional security. 
A separate edge node (not running HS2, HS2 runs on another box) is used for HiveCLI.
We don't allow data/worker nodes to talk to the edge node on random ports. All ports must
be registered or explicitly specified and monitored.
That's why I am asking for this feature. Otherwise, opening up 1024-65535 from data/worker
node to edge node is actually 
a bad idea and bad practice for network security.  :( 

From: Xuefu Zhang <>
Sent: Monday, October 19, 2015 1:12 PM
Subject: Re: Hard Coded 0 to assign RPC Server port number when hive.execution.engine=spark

Hi Andrew,

RpcServer is an instance launched for each user session. In case of Hive
CLI, which is for a single user, what you said makes sense and the port
number can be configurable. In the context of HS2, however, there are
multiple user sessions and the total is unknown in advance. While +1 scheme
works, there can be still a band of ports that might be eventually opened.

On a different perspective, we expect that either Hive CLI or HS2 resides
on a gateway node, which are in the same network with the data/worker
nodes. In this configuration, firewall issue you mentioned doesn't apply.
Such configuration is what we usually see in our enterprise customers,
which is what we recommend. I'm not sure why you would want your Hive users
to launch Hive CLI anywhere outside your cluster, which doesn't seem secure
if security is your concern.


On Mon, Oct 19, 2015 at 7:20 AM, Andrew Lee <> wrote:

> Hi All,
> I notice that in
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/
> The port number is assigned with 0 which means it will be a random port
> every time when the RPC Server is created
> to talk to Spark in the same session.
> Any reason why this port number is not a property to be configured and
> follow the same rule as +1 if the port is taken?
> Just like Spark's configuration for Spark Driver, etc.?  Because of this,
> this is causing problems to configure firewall between the
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In
> other word, users need to open all hive ports range
> from Data Node => HiveCLI (edge node).
> = new ServerBootstrap()
>       .group(group)
>       .channel(NioServerSocketChannel.class)
>       .childHandler(new ChannelInitializer<SocketChannel>() {
>           @Override
>           public void initChannel(SocketChannel ch) throws Exception {
>             SaslServerHandler saslHandler = new SaslServerHandler(config);
>             final Rpc newRpc = Rpc.createServer(saslHandler, config, ch,
> group);
>             saslHandler.rpc = newRpc;
>             Runnable cancelTask = new Runnable() {
>                 @Override
>                 public void run() {
>                   LOG.warn("Timed out waiting for hello from client.");
>                   newRpc.close();
>                 }
>             };
>             saslHandler.cancelTask = group.schedule(cancelTask,
>                 RpcServer.this.config.getServerConnectTimeoutMs(),
>                 TimeUnit.MILLISECONDS);
>           }
>       })
>       .option(ChannelOption.SO_BACKLOG, 1)
>       .option(ChannelOption.SO_REUSEADDR, true)
>       .childOption(ChannelOption.SO_KEEPALIVE, true)
>       .bind(0)
>       .sync()
>       .channel();
>     this.port = ((InetSocketAddress) channel.localAddress()).getPort();
> Appreciate any feedback, and if a JIRA is required to keep track of this
> conversation. Thanks.
View raw message