Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9201E200C1E for ; Fri, 17 Feb 2017 12:10:47 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 8D7C5160B3F; Fri, 17 Feb 2017 11:10:47 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D3206160B55 for ; Fri, 17 Feb 2017 12:10:46 +0100 (CET) Received: (qmail 17495 invoked by uid 500); 17 Feb 2017 11:10:46 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 17485 invoked by uid 99); 17 Feb 2017 11:10:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Feb 2017 11:10:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 75445C0B02 for ; Fri, 17 Feb 2017 11:10:45 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.999 X-Spam-Level: X-Spam-Status: No, score=-1.999 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id AOFAwstjJxdi for ; Fri, 17 Feb 2017 11:10:44 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 43FBA5F3FF for ; Fri, 17 Feb 2017 11:10:44 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 9657DE073C for ; Fri, 17 Feb 2017 11:10:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id E7C5624123 for ; Fri, 17 Feb 2017 11:10:41 +0000 (UTC) Date: Fri, 17 Feb 2017 11:10:41 +0000 (UTC) From: "Mass Dosage (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-15965) Metastore incorrectly re-uses a broken database connection MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 17 Feb 2017 11:10:47 -0000 [ https://issues.apache.org/jira/browse/HIVE-15965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mass Dosage updated HIVE-15965: ------------------------------- Attachment: hive.log Log file from a test that reproduces the issue > Metastore incorrectly re-uses a broken database connection > ---------------------------------------------------------- > > Key: HIVE-15965 > URL: https://issues.apache.org/jira/browse/HIVE-15965 > Project: Hive > Issue Type: Bug > Components: Metastore > Affects Versions: storage-2.2.0 > Reporter: Mass Dosage > Attachments: hive.log > > > *Background* > In our setup we have a shared standalone MetaStore server running on EMR that is accessed by various clients (Hive CLI, HiveServer2, Spark etc.) and connects to an external MariaDB database for the MetaStore DB. It came to our attention that MetaStore (or rather the underlying DataNucleus / BoneCP combo) will keep re-using the same DB connections even when those get suddenly closed for a reason that renders them unusable. > For instance, due to a bug in the MariaDB JDBC driver v1.3.6 (see https://jira.mariadb.org/browse/CONJ-270), a huge query including over 8 thousand parameter placeholders (e.g. partition IDs in case of a {{get_partitions_by_expr}} function call) > will yield a {{java.nio.BufferOverflowException}} and cause the SQL connection be closed by the driver itself. > This will ultimately result in the abortion of all further MetaStore Thrift calls due to the failure of {{bonecp.ConnectionHandle.prepareStatement()}}. > Such scenarios will be then caught by DataNucleus and translated to an appropriate {{JDOException}}, only to be "ignored" by the MetaStore.{{RetryingHMSHandler}} will, of course, continue retrying the failing operation, but this is already pointless by that time since they will invariably fail as long as the SQL connection remains closed. Please see the attached MetaStore log [^hive.log] for details > (captured from Hive 2.1.1 running on Windows in Eclipse IDE). > *Proposed behavior* > We suggest that MetaStore should automatically renew the DB connection whenever: > * The connection gets closed by one of the underlying frameworks (DataNucleus, BoneCP, JDBC driver); or > * Query timeout is detected. > This feature should be optional and configurable (disabled by default for backward compatibility). Reconnection failures could probably be treated as fatal errors and cause the immediate termination of MetaStore. -- This message was sent by Atlassian JIRA (v6.3.15#6346)