Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DB24FD892 for ; Thu, 7 Mar 2013 14:31:53 +0000 (UTC) Received: (qmail 71397 invoked by uid 500); 7 Mar 2013 14:31:52 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 70975 invoked by uid 500); 7 Mar 2013 14:31:47 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 70948 invoked by uid 99); 7 Mar 2013 14:31:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Mar 2013 14:31:46 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of suresh.krishnappa@gmail.com designates 74.125.82.51 as permitted sender) Received: from [74.125.82.51] (HELO mail-wg0-f51.google.com) (74.125.82.51) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Mar 2013 14:31:40 +0000 Received: by mail-wg0-f51.google.com with SMTP id 8so925199wgl.6 for ; Thu, 07 Mar 2013 06:31:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type; bh=wUlWeHFYY3+w6fgaDm+OZ5tjYv/VIQXIAlJyqzg4mvs=; b=WfZo9dsAh/FYMSHB+6e+i53BMhqwEyrc8ln8Y9O0yEwG6qZrp0OC0Cv+otG7GCyA58 11duSWuU3qH68ptoVGLydtaBG43ul+JYlJaE5uBI+514KzFDJIJ2YLNE1hoq51uV97De Qj3XkUHSG2ZgT6zoFKhJFfPQTvVD+9XfzfL+p5A7vND1mJYFvBfEh7vPJ34dhTtMutiB D9kG71WnT6HAhNqKQnf7/EJzLD8foQs5wX5L8yY6b+ar6GKvv92yG9k8jp8vGtL7VlHN wFDLdfy8Loop0g/BY+TUfdRo19gg5vPQ5OIAoOVSoRd5I5ENH8Y8N8uwXMAb7wFdIWZm Gqgw== MIME-Version: 1.0 X-Received: by 10.180.82.33 with SMTP id f1mr33544059wiy.13.1362666679473; Thu, 07 Mar 2013 06:31:19 -0800 (PST) Received: by 10.194.93.130 with HTTP; Thu, 7 Mar 2013 06:31:19 -0800 (PST) Date: Thu, 7 Mar 2013 20:01:19 +0530 Message-ID: Subject: HIVE issues when using large number of partitions From: Suresh Krishnappa To: user@hive.apache.org Content-Type: multipart/alternative; boundary=f46d041826da609f9404d7568f8f X-Virus-Checked: Checked by ClamAV on apache.org --f46d041826da609f9404d7568f8f Content-Type: text/plain; charset=ISO-8859-1 Hi All, I have a hadoop cluster with data present in large number of directories ( > 10,000) To run HIVE queries over this data I created an external partitioned table and pointed each directory as a partition to the external table using 'alter table add partition' command. Is there a better way to create a HIVE external table over large number of directories? Also I am facing the following issues due to the large number of partitions 1) The DDL operations of creating the table and adding partitions to the table takes a very long time. Takes about an hour to add around 10,000 partitions 2) Getting 'out of memory' java exception while adding partitions > 50000 3) Sometimes getting 'out of memory' java exception for select queries for partitions > 10000 What is the recommended limit to the number of partitions that we can create with an HIVE table? Are there any configuration settings in hive/hadoop to support large number of partitions? I am using HIVE 0.10.0. I re-ran the tests by replacing derby with postgresql as metastore and still faced similar issues. Would appreciate any inputs on this Thanks Suresh --f46d041826da609f9404d7568f8f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi All,
I have a hadoop cluster with data present in large number of di= rectories ( > 10,000)
To run HIVE queries over this data I cre= ated an external partitioned table and pointed each directory as a partitio= n to the external table using 'alter table add partition' command.= =A0
Is there a better way to create a HIVE external table over large numbe= r of directories?

Also I am facing the following i= ssues due to the large number of partitions
1) The DDL operations= of creating the table and adding partitions to the table takes a very long= time. Takes about an hour to add around 10,000 partitions
2) Getting 'out of memory' java exception while adding partiti= ons > 50000
3) Sometimes getting 'out of memory' java = exception for select queries for partitions > 10000

What is the recommended limit to the number of partitions that we can = create with an HIVE table?
Are there any configuration settings i= n hive/hadoop to support large number of partitions?

I am using HIVE 0.10.0. I re-ran the tests by replacing derby with pos= tgresql as metastore and still faced similar issues.

Would appreciate any inputs on this

Thanks
Suresh

--f46d041826da609f9404d7568f8f--