Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 935BF11390 for ; Wed, 11 Jun 2014 23:01:06 +0000 (UTC) Received: (qmail 57100 invoked by uid 500); 11 Jun 2014 23:01:06 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 55489 invoked by uid 500); 11 Jun 2014 23:01:03 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 55478 invoked by uid 500); 11 Jun 2014 23:01:03 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 55475 invoked by uid 99); 11 Jun 2014 23:01:03 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Jun 2014 23:01:03 +0000 Date: Wed, 11 Jun 2014 23:01:03 +0000 (UTC) From: "Mithun Radhakrishnan (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-7195) Improve Metastore performance MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028545#comment-14028545 ] Mithun Radhakrishnan commented on HIVE-7195: -------------------------------------------- [~sershe]: listPartitions(), etc. do have a max_parts parameter. I'm exploring the possibility of reducing the thrift traffic for partition-operations, for a given number of partitions. That would free us up to transfer metadata for more partitions, without fear of the metastore keeling over from heap-frag, etc. One way of doing that is to reduce redundancy when specifying multiple partitions. Abstracting how partitions are specified makes it possible to vary and extend this. > Improve Metastore performance > ----------------------------- > > Key: HIVE-7195 > URL: https://issues.apache.org/jira/browse/HIVE-7195 > Project: Hive > Issue Type: Improvement > Reporter: Brock Noland > Priority: Critical > > Even with direct SQL, which significantly improves MS performance, some operations take a considerable amount of time, when there are many partitions on table. Specifically I believe the issue: > * When a client gets all partitions we do not send them an iterator, we create a collection of all data and then pass the object over the network in total > * Operations which require looking up data on the NN can still be slow since there is no cache of information and it's done in a serial fashion > * Perhaps a tangent, but our client timeout is quite dumb. The client will timeout and the server has no idea the client is gone. We should use deadlines, i.e. pass the timeout to the server so it can calculate that the client has expired. -- This message was sent by Atlassian JIRA (v6.2#6252)