Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1EBFD19A98 for ; Wed, 30 Mar 2016 21:53:37 +0000 (UTC) Received: (qmail 97040 invoked by uid 500); 30 Mar 2016 21:53:35 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 96970 invoked by uid 500); 30 Mar 2016 21:53:35 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 96954 invoked by uid 99); 30 Mar 2016 21:53:35 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Mar 2016 21:53:35 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 15F0E1A122B for ; Wed, 30 Mar 2016 21:53:35 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id qRlY61rbcdYq for ; Wed, 30 Mar 2016 21:53:33 +0000 (UTC) Received: from mail-ob0-f173.google.com (mail-ob0-f173.google.com [209.85.214.173]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id E1BDA5F23D for ; Wed, 30 Mar 2016 21:53:32 +0000 (UTC) Received: by mail-ob0-f173.google.com with SMTP id x3so84349377obt.0 for ; Wed, 30 Mar 2016 14:53:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=Q+CmvRplqMQEjmUo4MlCAw/Tjwy9chJFWjnY0nbZWIs=; b=zNORV3rJ6jGMuuB+moXkZZCq+Iv8wkfYqo0cZCK+cDJ5ZkWf6NUWfK2EKrDT3gw1w6 3PuGS7Wvp4Eb8JedZC22eoZ9SZzcDXF65sGndP3EhRh+f1BZPEMFul3h56uNamSaOx+Y FN5BhKmYMD6oY9k+ac5dZFG5uaCEgKKuOpELiSiUZEqvHq4wH0/r9E7Ue5sbOBjE0Hxr T5+6/bo3icy/MG/i2XgDJ6IyG9S6Wk35x+1JqHfSdjMd43jpWthM1vQrr/w79N8Q8zNO JkZcu5pw4jxIbWJu5mxiyypXPCP2k88YvjhY88BCqI6CJf+HHjAjKQdCxCqKtbuJianj 1udQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=Q+CmvRplqMQEjmUo4MlCAw/Tjwy9chJFWjnY0nbZWIs=; b=Nk5RDt85LC4KH22JEMlPwjNc63rLTFOoXxoeMA83+FEZ6feCwXH1u4D+1TgfB0Q1cr D4PLQcYS04xWaYG3iY1DXEDzzVXAnSbqTmfbEjk+YQ9J7yVkvE7FTSa9PysC2KUYB8YF EaFuMrBelPSVOg0rzbNkr6aXD5r6Jlo4mOP6qMbAwOMH+jobeSgPiaztR689FS6TpDR/ 1x2m6LRO/N1rlxolrr7Zmc9hBEwQ9W3ttEtuYfgE24eTbBAfDyfz8rOO7OstJyCuD3XT YRcmNm0mgCBKyupUJJdzl7VSCNLNg8HZa9NF4aOzBgpz574zGerfQ9WIlvYiKuANs0dT XeVw== X-Gm-Message-State: AD7BkJKQNctrRq66rM1JAfXcouU6O6MEUPQUxJBXV8ShDHMifWS0hOw3hIgbbNk8eCx1SMF+0Q9Fh/UXog1ngw== MIME-Version: 1.0 X-Received: by 10.182.252.104 with SMTP id zr8mr320683obc.70.1459374812125; Wed, 30 Mar 2016 14:53:32 -0700 (PDT) Received: by 10.76.6.177 with HTTP; Wed, 30 Mar 2016 14:53:32 -0700 (PDT) In-Reply-To: References: Date: Wed, 30 Mar 2016 14:53:32 -0700 Message-ID: Subject: Re: Hive Metastore Bottleneck From: Gautam To: user@hive.apache.org Content-Type: multipart/alternative; boundary=001a1134baf8455ecc052f4b2d45 --001a1134baf8455ecc052f4b2d45 Content-Type: text/plain; charset=UTF-8 Can you elaborate on where you see the bottleneck? A general overview of your access path would be useful. For instance if you'r accessing Hive metastore via HiveServer2 or from webhcat using embedded cli or something else. Have you tried putting multiple metastores behind a load balancer? It's just a thrift service over mysql so can have multiple instances pointing to same backend db. On Wed, Mar 30, 2016 at 2:28 PM, Udit Mehta wrote: > Hi all, > > We are currently running Hive in production and staging with the metastore > connecting to a MySql database in the backend. The traffic in production > accessing the metastore is more than staging which is expected. We have had > a sudden increase in traffic which has led to the metastore operation > taking a lot longer than before. The same query on staging takes a lot less > due to the lesser traffic on the staging cluster. > > We tried increasing the heap space for the metastore process as well as > bumped up the memory for the mysql database. Both these changes did not > seem to help much and we still see delays. Is there any other config we can > increase to counter this increased traffic? I am looking at config for max > threads as well but im not sure if this is the right path ahead. > > Im wondering if the metastore is a bottleneck here or im missing something. > > Looking forward to your reply, > Udit > -- "If you really want something in this life, you have to work for it. Now, quiet! They're about to announce the lottery numbers..." --001a1134baf8455ecc052f4b2d45 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Can you elaborate on where= you see the bottleneck? =C2=A0 A general overview of your access path woul= d be useful. For instance if you'r accessing Hive metastore via HiveSer= ver2 or from webhcat using embedded cli or something else.=C2=A0

Have y= ou tried putting multiple metastores behind a load balancer? It's just = a thrift service over mysql so can have multiple instances pointing to same= backend db.

On Wed, Mar 30, 2016 at 2:28 PM, Udit Mehta <umehta@groupon.com> wrote:
Hi all,

We are currently running Hive in production and staging wit= h the metastore connecting to a MySql database in the backend. The traffic = in production accessing the metastore is more than staging which is expecte= d. We have had a sudden increase in traffic which has led to the metastore = operation taking a lot longer than before. The same query on staging takes = a lot less due to the lesser traffic on the staging cluster.

<= div>We tried increasing the heap space for the metastore process as well as= bumped up the memory for the mysql database. Both these changes did not se= em to help much and we still see delays. Is there any other config we can i= ncrease to counter this increased traffic? I am looking at config for max t= hreads as well but im not sure if this is the right path ahead.

Im wondering if the metastore is a bottleneck here or im missing som= ething.

Looking forward to your reply,
Udit



--
"If you really want something in this life, you h= ave to work for it. Now, quiet! They're about to announce the lottery n= umbers..."
--001a1134baf8455ecc052f4b2d45--