hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pala M Muthaia <mchett...@rocketfuelinc.com>
Subject Re: DISTRIBUTE BY works incorrectly in Hive 0.11 in some cases
Date Sat, 24 Aug 2013 00:55:49 GMT
I have attached the hive 10 and 11 query plans, for the sample query below,
for illustration.


On Fri, Aug 23, 2013 at 5:35 PM, Pala M Muthaia <mchettiar@rocketfuelinc.com
> wrote:

> Hi,
>
> We are using DISTRIBUTE BY with custom reducer scripts in our query
> workload.
>
> After upgrade to Hive 0.11, queries with GROUP BY/DISTRIBUTE BY/SORT BY
> and custom reducer scripts produced incorrect results. Particularly, rows
> with same value on DISTRIBUTE BY column ends up in multiple reducers and
> thus produce multiple rows in final result, when we expect only one.
>
> I investigated a little bit and discovered the following behavior for Hive
> 0.11:
>
> - Hive 0.11 produces a different plan for these queries with incorrect
> results. The extra stage for the DISTRIBUTE BY + Transform is missing and
> the Transform operator for the custom reducer script is pushed into the
> reduce operator tree containing GROUP BY itself.
>
> - However, *if the SORT BY in the query has a DESC order in it*, the
> right plan is produced, and the results look correct too.
>
> Hive 0.10 produces the expected plan with right results in all cases.
>
>
> To illustrate, here is a simplified repro setup:
>
> Table:
>
> *CREATE TABLE test_cluster (grp STRING, val1 STRING, val2 INT, val3
> STRING, val4 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES
> TERMINATED BY '\n' STORED AS TEXTFILE;*
>
> Query:
>
> *ADD FILE reducer.py;*
>
> *FROM(*
> *  SELECT grp, val2 *
> *  FROM test_cluster *
> *  GROUP BY grp, val2 *
> *  DISTRIBUTE BY grp *
> *  SORT BY grp, val2  -- add DESC here to get correct results*
> *) **a*
> *
> *
> *REDUCE a.**
> *USING 'reducer.py'*
> *AS grp, reducedValue*
>
>
> If i understand correctly, this is a bug. Is this a known issue? Any other
> insights? We have reverted to Hive 0.10 to avoid the incorrect results
> while we investigate this.
>
> I have the repro sample, with test data and scripts, if anybody is
> interested.
>
>
>
> Thanks,
> pala
>

Mime
View raw message