Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 41F0E7018 for ; Thu, 3 Nov 2011 08:13:30 +0000 (UTC) Received: (qmail 50539 invoked by uid 500); 3 Nov 2011 08:13:29 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 50453 invoked by uid 500); 3 Nov 2011 08:13:25 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 50445 invoked by uid 99); 3 Nov 2011 08:13:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Nov 2011 08:13:24 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of florin.diaconeasa@gmail.com designates 74.125.82.54 as permitted sender) Received: from [74.125.82.54] (HELO mail-ww0-f54.google.com) (74.125.82.54) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Nov 2011 08:13:18 +0000 Received: by wwf10 with SMTP id 10so1293854wwf.23 for ; Thu, 03 Nov 2011 01:12:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=/wDYekTAI6YPCZb0LkVDl8dYoKUcoG/Jw6PQUIkxodI=; b=h1J9bul5RSRMhY9EM7zbwwbPiMdwgjK9vjVB/EkYfT/RH9xBxwYPWf69R4DSVORQZG cgmL5usy9yuk+cupOG9g9D8Q3wiCKx5rG9qtaelglcIG8B+I78D2BYzeH2MsNF/TRYk4 KbIZmdPjqmjbbgYx6M3+l1cNU5Z1Y4Yo8gOIs= MIME-Version: 1.0 Received: by 10.216.133.93 with SMTP id p71mr2413068wei.64.1320307976663; Thu, 03 Nov 2011 01:12:56 -0700 (PDT) Received: by 10.216.11.3 with HTTP; Thu, 3 Nov 2011 01:12:56 -0700 (PDT) Reply-To: florin.diaconeasa@gmail.com In-Reply-To: <4F6B25AFFFCAFE44B6259A412D5F9B103994D62F@ExchMBX104.netflix.com> References: <4F6B25AFFFCAFE44B6259A412D5F9B103994D62F@ExchMBX104.netflix.com> Date: Thu, 3 Nov 2011 10:12:56 +0200 Message-ID: Subject: Re: High number of input files problems From: Florin Diaconeasa To: user@hive.apache.org Content-Type: multipart/alternative; boundary=0016e6dbde57f1719b04b0d0277b --0016e6dbde57f1719b04b0d0277b Content-Type: text/plain; charset=ISO-8859-1 :( Upgrading is not really an option at this point. However, any idea if there was any bug like this solved? Maybe i could port the patch to the 0.6 version. On 2 November 2011 04:16, Steven Wong wrote: > I suspect very few people are still using Hive 0.6 or older. Try upgrading. > **** > > ** ** > > ** ** > > *From:* Florin Diaconeasa [mailto:florin.diaconeasa@gmail.com] > *Sent:* Monday, October 31, 2011 6:37 AM > *To:* user@hive.apache.org > *Subject:* High number of input files problems**** > > ** ** > > Hello,**** > > ** ** > > Lately our user base has increased so the input files have increased > considerably in size and number.**** > > ** ** > > One of our processing steps is doing a query of the form found at the end > of the email. My problem is that apparently, sometimes, the processing > misses some of the input files (for the 2nd select in most cases).**** > > ** ** > > I'm using Hive 0.6, Hadoop 0.20.2 on a Debian 5 64bit and we are > connecting to a hive server instance using JDBC. Any idea on what > parameters i could tune of any tickets that have been opened on this > problem? I searched the Hive JIRA for nothing until now... The only thing > that i think might be related is > https://issues.apache.org/jira/browse/HIVE-1884**** > > ** ** > > SELECT**** > > t.a,**** > > sum(t.b),**** > > sum(t.c),**** > > sum(t.d)**** > > FROM**** > > (**** > > SELECT**** > > a,**** > > sum(x) as b,**** > > sum(y) as c,**** > > sum(z) as d**** > > FROM T1**** > > WHERE ...**** > > GROUP BY ...**** > > **** > > UNION ALL**** > > ** ** > > SELECT**** > > a,**** > > sum(x) as b,**** > > sum(y) as c,**** > > sum(z) as d**** > > FROM T2**** > > WHERE ...**** > > GROUP BY ...**** > > **** > > UNION ALL**** > > ** ** > > SELECT**** > > a,**** > > sum(x) as b,**** > > sum(y) as c,**** > > sum(z) as d**** > > FROM T3**** > > WHERE ...**** > > GROUP BY ...**** > > ) t**** > > ** ** > > GROUP BY ...**** > > ** ** > > ** ** > > ** ** > > -- > > > Florin**** > -- Florin --0016e6dbde57f1719b04b0d0277b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable :( Upgrading is not really an option at this point. However, any idea if th= ere was any bug like this solved? Maybe i could port the patch to the 0.6 v= ersion.

On 2 November 2011 04:16, Steven = Wong <swong@netfl= ix.com> wrote:

I suspect very few people are still using Hive 0.6 or older. Try up= grading.

=A0

=A0

From: Florin Diaconeasa [mailto:florin.diaconeasa@gmail.com]
Sent: Monday, October 31, 2011 6:37 AM
To: user@hive.apache.org
S= ubject: High number of input files problems

=A0

Hello,

=A0

Lately our user base h= as increased so the input files have increased considerably in size and num= ber.

=A0

One of our processing steps is doing a query of the form fou= nd at the end of the email. My problem is that apparently, sometimes, the p= rocessing misses some of the input files (for the 2nd select in most cases)= .

=A0

I'm using Hive 0.6, Hadoop 0.20.2 on a Debian 5 64bit an= d we are connecting to a hive server instance using JDBC. Any idea on what = parameters i could tune of any tickets that have been opened on this proble= m? I searched the Hive JIRA for nothing until now... The only thing that i = think might be related is=A0https://issues.apache.org/jira/browse/HIVE-1= 884

=A0

SELECT

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 t.a,

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 sum(t.b),

<= /div>

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 sum(t.c),

=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 sum(t.d)

FROM

(

=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0 SELECT

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0 a,

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 sum(x) as b,

<= /div>

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 sum(y) as c,

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= sum(z) as d

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 FROM T1

<= div>

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 W= HERE ...

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 <= /span>GROUP BY ...

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0

UNION ALL

=A0

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 SELECT=

=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 a,

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 sum(x) as b,

<= /div>

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 sum(y) as c,

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= sum(z) as d

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 FROM T2

<= div>

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 W= HERE ...

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 <= /span>GROUP BY ...

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0

UNION ALL

=A0

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 SELECT=

=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 a,

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 sum(x) as b,

<= /div>

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 sum(y) as c,

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= sum(z) as d

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 FROM T3

<= div>

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 W= HERE ...

=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 <= /span>GROUP BY ...

) t

=A0

GROUP BY ...

=A0

=A0

=A0

--


Florin




--


Florin
--0016e6dbde57f1719b04b0d0277b--