hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <>
Subject [jira] [Updated] (HIVE-17979) Tez: Improve ReduceRecordSource passDownKey copying
Date Mon, 06 Nov 2017 23:43:00 GMT


Gopal V updated HIVE-17979:
    Attachment: HIVE-17979.2.patch

> Tez: Improve ReduceRecordSource passDownKey copying
> ---------------------------------------------------
>                 Key: HIVE-17979
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: Gopal V
>            Assignee: Gopal V
>         Attachments: HIVE-17979.1.patch, HIVE-17979.2.patch
> Tez does not use a single Key stream for both sides of the join, so each input gets its
own ReduceRecordSource 
> {code}
> sources[tag] = new ReduceRecordSource();
> {code}
> And this means for each input stream, there's a deserialized key (because the tag is
not part of the Key byte stream), this means for a 2-table join there are 2 ReduceRecordSource
> This means that the passDownKey is only an optimization when the Key, List<Value>
has more than 1 value in it. Otherwise the copy is entirely wasted CPU cycles, because it
deserializes the entire row to extract the key and discards the row.

This message was sent by Atlassian JIRA

View raw message