Return-Path: Delivered-To: apmail-hadoop-hive-user-archive@minotaur.apache.org Received: (qmail 45319 invoked from network); 11 Jun 2010 23:17:31 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 11 Jun 2010 23:17:31 -0000 Received: (qmail 41776 invoked by uid 500); 11 Jun 2010 23:17:31 -0000 Delivered-To: apmail-hadoop-hive-user-archive@hadoop.apache.org Received: (qmail 41583 invoked by uid 500); 11 Jun 2010 23:17:30 -0000 Mailing-List: contact hive-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-user@hadoop.apache.org Delivered-To: mailing list hive-user@hadoop.apache.org Received: (qmail 41570 invoked by uid 99); 11 Jun 2010 23:17:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jun 2010 23:17:30 +0000 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of athusoo@facebook.com designates 69.63.179.25 as permitted sender) Received: from [69.63.179.25] (HELO mailout-sf2p.facebook.com) (69.63.179.25) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jun 2010 23:17:23 +0000 Received: from mail.thefacebook.com ([192.168.18.212]) by pp02.snc1.tfbnw.net (8.14.3/8.14.3) with ESMTP id o5BNFqfj001712 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Fri, 11 Jun 2010 16:16:22 -0700 Received: from sc-hub05.TheFacebook.com (192.168.18.82) by sc-hub04.TheFacebook.com (192.168.18.212) with Microsoft SMTP Server (TLS) id 14.0.694.1; Fri, 11 Jun 2010 16:14:33 -0700 Received: from SC-MBXC1.TheFacebook.com ([192.168.18.100]) by sc-hub05.TheFacebook.com ([192.168.18.82]) with mapi; Fri, 11 Jun 2010 16:09:49 -0700 From: Ashish Thusoo To: "hive-user@hadoop.apache.org" Date: Fri, 11 Jun 2010 16:09:49 -0700 Subject: RE: Dealing with large number of partitions Thread-Topic: Dealing with large number of partitions Thread-Index: AcsJMHatQcbzralXReGOa3vpJyWu+gAiqadQ Message-ID: <68B7689C98024D43B4C2709456F0B5200B3AC70084@SC-MBXC1.TheFacebook.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_68B7689C98024D43B4C2709456F0B5200B3AC70084SCMBXC1TheFac_" MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.8161:2.4.5,1.2.40,4.0.166 definitions=2010-06-11_03:2010-02-06,2010-06-11,2010-06-11 signatures=0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_68B7689C98024D43B4C2709456F0B5200B3AC70084SCMBXC1TheFac_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable +1 to that. That should help provided you are running hadoop 0.20 .. Ashish ________________________________ From: wd [mailto:wd@wdicc.com] Sent: Thursday, June 10, 2010 11:36 PM To: hive-user@hadoop.apache.org Subject: Re: Dealing with large number of partitions Try set hive.input.format=3Dorg.apache.hadoop.hive.ql.io.CombineHiveInputFo= rmat; before you query, this may be help. 2010/6/11 Sammy Yu > Hi, I am having an issue with a large number of 4000 partitions (each being = very small <10k files). Any queries that I do which involve these partitio= ns take an extremely long time to complete (10+ hours), I was wondering if = there was any easy way in hive without having to merge the files improve it= 's performance. I can see the map reduce jobs are taking a long time due t= o the fact that there are so many separated raw data files that need to be = read. I saw that HIVE-1332 dealt with using HAR files for partitioning. C= ould this perhaps help performance rather than hurt it, given that the quer= ies will be using all the partitions in the har file? Thanks, Sammy --_000_68B7689C98024D43B4C2709456F0B5200B3AC70084SCMBXC1TheFac_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable
+1 to that. That should help provided you are runn= ing=20 hadoop 0.20 ..
 
Ashish


From: wd [mailto:wd@wdicc.com]
S= ent:=20 Thursday, June 10, 2010 11:36 PM
To:=20 hive-user@hadoop.apache.org
Subject: Re: Dealing with large numbe= r of=20 partitions

Try set hive.input.format=3Dorg.apache.hadoop.hive.ql.io.Co=
mbineHiveInputFormat; before you query, this may be help.


2010/6/11 Sammy Yu <syu@brightedge.com>
Hi,
   I am having an issue with a large number of 4000 partit= ions=20 (each being very small <10k files).  Any queries that I do which= =20 involve these partitions take an extremely long time to complete (10+ hou= rs),=20 I was wondering if there was any easy way in hive without having to merge= the=20 files improve it's performance.  I can see the map reduce jobs are t= aking=20 a long time due to the fact that there are so many separated ra= w=20 data files that need to be read.  I saw that HIVE-1332 dealt with us= ing=20 HAR files for partitioning.  Could this perhaps help performance rat= her=20 than hurt it, given that the queries will be using all the partitions in = the=20 har file?

Thanks,
Sammy
 





--_000_68B7689C98024D43B4C2709456F0B5200B3AC70084SCMBXC1TheFac_--