hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <>
Subject [jira] [Updated] (HIVE-2056) Generate single MR job for multi groupby query.
Date Mon, 09 May 2011 13:33:03 GMT


Amareshwari Sriramadasu updated HIVE-2056:

    Attachment: patch-2056.txt

Attached patch generates a single M/R job for multi group by query with non-null common group
by key set. Added configuration hive.multigroupby.singlemr to turn on and off the optimization.

It considers no-distinct or single common distinct expression; did not multi distinct expressions
yet. Will do in a follow up if required.

Performance numbers:
||Number of rows in table|| Query || Time taken by 3 M/R jobs plan || Time taken by Single
M/R job plan||
|100 | query1| 58.416 seconds |22.099 seconds| 
|33682 million | query1 | Did not succeed | 11434.308 seconds|
|33682 million | query2 | 2hrs, 48mins, 15sec |16mins, 3sec.|

Query1 did not succeed with 33682 million row table with existing plan. Reducers failed with
OOM after 12 hours. I tried many combinations of number of reducers and Xmx values, but in
Verified the correctness for 100 row table row by row; and number of rows in the result for
33682 million rows table. 

> Generate single MR job for multi groupby query.
> -----------------------------------------------
>                 Key: HIVE-2056
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-2056.txt

This message is automatically generated by JIRA.
For more information on JIRA, see:

View raw message