hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Wong <sw...@netflix.com>
Subject RE: High number of input files problems
Date Wed, 02 Nov 2011 02:16:18 GMT
I suspect very few people are still using Hive 0.6 or older. Try upgrading.


From: Florin Diaconeasa [mailto:florin.diaconeasa@gmail.com]
Sent: Monday, October 31, 2011 6:37 AM
To: user@hive.apache.org
Subject: High number of input files problems

Hello,

Lately our user base has increased so the input files have increased considerably in size
and number.

One of our processing steps is doing a query of the form found at the end of the email. My
problem is that apparently, sometimes, the processing misses some of the input files (for
the 2nd select in most cases).

I'm using Hive 0.6, Hadoop 0.20.2 on a Debian 5 64bit and we are connecting to a hive server
instance using JDBC. Any idea on what parameters i could tune of any tickets that have been
opened on this problem? I searched the Hive JIRA for nothing until now... The only thing that
i think might be related is https://issues.apache.org/jira/browse/HIVE-1884

SELECT
            t.a,
            sum(t.b),
            sum(t.c),
            sum(t.d)
FROM
(
            SELECT
                        a,
                        sum(x) as b,
                        sum(y) as c,
                        sum(z) as d
            FROM T1
            WHERE ...
            GROUP BY ...

UNION ALL

            SELECT
                        a,
                        sum(x) as b,
                        sum(y) as c,
                        sum(z) as d
            FROM T2
            WHERE ...
            GROUP BY ...

UNION ALL

            SELECT
                        a,
                        sum(x) as b,
                        sum(y) as c,
                        sum(z) as d
            FROM T3
            WHERE ...
            GROUP BY ...
) t

GROUP BY ...



--


Florin

Mime
View raw message