hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vikram Dixit K (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-6144) Implement non-staged MapJoin
Date Sat, 11 Jan 2014 01:35:50 GMT

    [ https://issues.apache.org/jira/browse/HIVE-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868590#comment-13868590
] 

Vikram Dixit K commented on HIVE-6144:
--------------------------------------

Hi Navis,

This is a nice improvement. Could you update your patch to the latest trunk? I think we should
enable this by default as it makes things better without breaking any other expectations.

Thanks
Vikram.

> Implement non-staged MapJoin
> ----------------------------
>
>                 Key: HIVE-6144
>                 URL: https://issues.apache.org/jira/browse/HIVE-6144
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>         Attachments: HIVE-6144.1.patch.txt
>
>
> For map join, all data in small aliases are hashed and stored into temporary file in
MapRedLocalTask. But for some aliases without filter or projection, it seemed not necessary
to do that. For example.
> {noformat}
> select a.* from src a join src b on a.key=b.key;
> {noformat}
> makes plan like this.
> {noformat}
> STAGE PLANS:
>   Stage: Stage-4
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         a 
>           Fetch Operator
>             limit: -1
>       Alias -> Map Local Operator Tree:
>         a 
>           TableScan
>             alias: a
>             HashTable Sink Operator
>               condition expressions:
>                 0 {key} {value}
>                 1 
>               handleSkewJoin: false
>               keys:
>                 0 [Column[key]]
>                 1 [Column[key]]
>               Position of Big Table: 1
>   Stage: Stage-3
>     Map Reduce
>       Alias -> Map Operator Tree:
>         b 
>           TableScan
>             alias: b
>             Map Join Operator
>               condition map:
>                    Inner Join 0 to 1
>               condition expressions:
>                 0 {key} {value}
>                 1 
>               handleSkewJoin: false
>               keys:
>                 0 [Column[key]]
>                 1 [Column[key]]
>               outputColumnNames: _col0, _col1
>               Position of Big Table: 1
>               Select Operator
>                 File Output Operator
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-0
>     Fetch Operator
> {noformat}
> table src(a) is fetched and stored as-is in MRLocalTask. With this patch, plan can be
like below.
> {noformat}
>   Stage: Stage-3
>     Map Reduce
>       Alias -> Map Operator Tree:
>         b 
>           TableScan
>             alias: b
>             Map Join Operator
>               condition map:
>                    Inner Join 0 to 1
>               condition expressions:
>                 0 {key} {value}
>                 1 
>               handleSkewJoin: false
>               keys:
>                 0 [Column[key]]
>                 1 [Column[key]]
>               outputColumnNames: _col0, _col1
>               Position of Big Table: 1
>               Select Operator
>                   File Output Operator
>       Local Work:
>         Map Reduce Local Work
>           Alias -> Map Local Tables:
>             a 
>               Fetch Operator
>                 limit: -1
>           Alias -> Map Local Operator Tree:
>             a 
>               TableScan
>                 alias: a
>           Has Any Stage Alias: false
>   Stage: Stage-0
>     Fetch Operator
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message