Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2EAB4200B92 for ; Wed, 28 Sep 2016 18:09:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 2D5D9160AB8; Wed, 28 Sep 2016 16:09:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6F58A160AD3 for ; Wed, 28 Sep 2016 18:09:22 +0200 (CEST) Received: (qmail 64342 invoked by uid 500); 28 Sep 2016 16:09:20 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 64057 invoked by uid 99); 28 Sep 2016 16:09:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Sep 2016 16:09:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 8757F2C2A67 for ; Wed, 28 Sep 2016 16:09:20 +0000 (UTC) Date: Wed, 28 Sep 2016 16:09:20 +0000 (UTC) From: "Aihua Xu (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-12222) Define port range in property for RPCServer MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 28 Sep 2016 16:09:23 -0000 [ https://issues.apache.org/jira/browse/HIVE-12222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-12222: ---------------------------- Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) Pushed to master. Thanks Xuefu for reviewing. > Define port range in property for RPCServer > ------------------------------------------- > > Key: HIVE-12222 > URL: https://issues.apache.org/jira/browse/HIVE-12222 > Project: Hive > Issue Type: Improvement > Components: CLI, Spark > Affects Versions: 1.2.1 > Environment: Apache Hadoop 2.7.0 > Apache Hive 1.2.1 > Apache Spark 1.5.1 > Reporter: Andrew Lee > Assignee: Aihua Xu > Fix For: 2.2.0 > > Attachments: HIVE-12222.1.patch, HIVE-12222.2.patch, HIVE-12222.3.patch > > > Creating this JIRA after discussin with Xuefu on the dev mailing list. Would need some help to review and update the fields in this JIRA ticket, thanks. > I notice that in > ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java > The port number is assigned with 0 which means it will be a random port every time when the RPC Server is created to talk to Spark in the same session. > Because of this, this is causing problems to configure firewall between the > HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other word, users need to open all hive ports range > from Data Node => HiveCLI (edge node). > {code} > this.channel = new ServerBootstrap() > .group(group) > .channel(NioServerSocketChannel.class) > .childHandler(new ChannelInitializer() { > @Override > public void initChannel(SocketChannel ch) throws Exception { > SaslServerHandler saslHandler = new SaslServerHandler(config); > final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, group); > saslHandler.rpc = newRpc; > Runnable cancelTask = new Runnable() { > @Override > public void run() { > LOG.warn("Timed out waiting for hello from client."); > newRpc.close(); > } > }; > saslHandler.cancelTask = group.schedule(cancelTask, > RpcServer.this.config.getServerConnectTimeoutMs(), > TimeUnit.MILLISECONDS); > } > }) > {code} > 2 Main reasons. > - Most users (what I see and encounter) use HiveCLI as a command line tool, and in order to use that, they need to login to the edge node (via SSH). Now, here comes the interesting part. > Could be true or not, but this is what I observe and encounter from time to time. Most users will abuse the resource on that edge node (increasing HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, etc), this may cause the HS2 process to run into OOME, choke and die, etc. various resource issues including others like login, etc. > - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly available. This makes sense to run it on the gateway node or a service node and separated from the HiveCLI. > The logs are located in different location, monitoring and auditing is easier to run HS2 with a daemon user account, etc. so we don't want users to run HiveCLI where HS2 is running. > It's better to isolate the resource this way to avoid any memory, file handlers, disk space, issues. > From a security standpoint, > - Since users can login to edge node (via SSH), the security on the edge node needs to be fortified and enhanced. Therefore, all the FW comes in and auditing. > - Regulation/compliance for auditing is another requirement to monitor all traffic, specifying ports and locking down the ports makes it easier since we can focus > on a range to monitor and audit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)