hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Girish Kadli (Jira)" <>
Subject [jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage
Date Mon, 03 Aug 2020 19:16:00 GMT


Girish Kadli commented on HIVE-3562:

I have a hive query its returning different results with and without limit.

Let's say with limit query result set as R1 and without limit query result set as R2.

These are the following discrepancies: 
 * R1 contains some of the column values as null. 
 * R2 doesn't contain the rows returned by R1.
 * R2 contains all non null column values. 
 * R2 is returning correct results, R1 is returning wrong results.

After debugging realised that *hive.limit.pushdown.memory.usage=0.1* 

is the root cause of this issue. after i set this property to -1, R1 starts returning correct
rows with non null column values. and R1 results are part of R2 results.

What could be the problem setting lower value to *hive.limit.pushdown.memory.usage?*

can it cause data issues in "with limit" hive queries by returning wrong results?







> Some limit can be pushed down to map stage
> ------------------------------------------
>                 Key: HIVE-3562
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Navis Ryu
>            Assignee: Navis Ryu
>            Priority: Trivial
>             Fix For: 0.12.0
>         Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, HIVE-3562.D5967.3.patch,
HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch,
HIVE-3562.D5967.8.patch, HIVE-3562.D5967.9.patch
> Queries with limit clause (with reasonable number), for example
> {noformat}
> select * from src order by key limit 10;
> {noformat}
> makes operator tree, 
> But LIMIT can be partially calculated in RS, reducing size of shuffling.

This message was sent by Atlassian Jira

View raw message