hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <>
Subject [jira] [Commented] (HIVE-15682) Eliminate per-row based dummy iterator creation
Date Mon, 13 Feb 2017 16:10:42 GMT


Xuefu Zhang commented on HIVE-15682:

Correct. It would be very helpful if you can try the following queries:

1. w/o HIVE-15580, w/ HIVE-15580, and w/ HIVE-15580+HIVE-15682 for an order by query like:
select count(*) from (select request_lat from dwh.fact_trip where datestr > '2017-01-27'
order by request_lat) x;

2. w/o HIVE-15580, w/ HIVE-15580 for an group by query like:
select count(*) from (select driver_uuid, avg(base_fare_usd) from dwh.fact_trip where datestr
> '2017-01-01' group by driver_uuid) x;

Also, it would be great if you can also analyze the benchmark result, especially confirming
why HIVE-15682 has adverse performance impact. Thanks.

> Eliminate per-row based dummy iterator creation
> -----------------------------------------------
>                 Key: HIVE-15682
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>    Affects Versions: 2.2.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>             Fix For: 2.2.0
>         Attachments: HIVE-15682.patch
> HIVE-15580 introduced a dummy iterator per input row which can be eliminated. This is
because {{SparkReduceRecordHandler}} is able to handle single key value pairs. We can refactor
this part of code 1. to remove the need for a iterator and 2. to optimize the code path for
per (key, value) based (instead of (key, value iterator)) processing. It would be also great
if we can measure the performance after the optimizations and compare to performance prior
to HIVE-15580.

This message was sent by Atlassian JIRA

View raw message