Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C1EAD7143 for ; Mon, 1 Aug 2011 12:26:33 +0000 (UTC) Received: (qmail 17836 invoked by uid 500); 1 Aug 2011 12:26:33 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 17765 invoked by uid 500); 1 Aug 2011 12:26:32 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 17757 invoked by uid 500); 1 Aug 2011 12:26:32 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 17754 invoked by uid 99); 1 Aug 2011 12:26:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Aug 2011 12:26:31 +0000 X-ASF-Spam-Status: No, hits=-2000.7 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Aug 2011 12:26:30 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id F191398A15 for ; Mon, 1 Aug 2011 12:26:09 +0000 (UTC) Date: Mon, 1 Aug 2011 12:26:09 +0000 (UTC) From: "Chinna Rao Lalam (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: <999926735.22738.1312201569986.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1259323368.2868.1309843761944.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (HIVE-2254) Provide an automatic recovery feature for Hive Server in case of failure MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-2254: ----------------------------------- Attachment: HIVE-2254.patch > Provide an automatic recovery feature for Hive Server in case of failure > ------------------------------------------------------------------------ > > Key: HIVE-2254 > URL: https://issues.apache.org/jira/browse/HIVE-2254 > Project: Hive > Issue Type: New Feature > Components: Clients, Query Processor, Server Infrastructure > Affects Versions: 0.5.0, 0.7.1 > Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5) > Reporter: Chinna Rao Lalam > Assignee: Chinna Rao Lalam > Attachments: HIVE-2254.patch, Hive Automatic Recovery Solution.pdf > > > *Motivation* > We are doing log analysis using Hive by submitting queries through Hive Server and we have provided Name Node HA and Job tracker HA to achieve the high availability but Currently Hive Server is a single point of failure. If the machine running Hive Server is down or broken, Hive service cannot be availed till someone notice the Hive Sever failure and bring it up till this time our log analysis is not continuing. To avoid this problem we need an automatic system that can detect the failure and make sure of the high availability of the Server. > *Proposal* > Deploy two Hive Servers. One of the Hive Server will act as active while the other one will be a Hot Standby. Here we need a system to decide which can be active and which can be standby and a failure detection mechanism it should detect if Active server is down or broken and trigger the switch over (standby to active). This failure detection mechanism will be based on Zookeeper (HA Agent). > The clients of Hive Server should be configured with the address of both servers. While getting the connection it will detect the Active Hive Server & connect to it. > While executing query Hive Server is down after starting Hive Server need to submit the query again but already executed query will run in the background. Continuing this query execution is no use so it is wastage of cluster resource. In this solution once active is down standby will become active to server and it will ensure to stop the already executed query execution (Hive tasks & MapRed jobs). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira