hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Deepak Jaiswal (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-17848) Bucket Map Join : Implement an efficient way to minimize loading hash table
Date Tue, 07 Nov 2017 21:04:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-17848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Deepak Jaiswal updated HIVE-17848:
----------------------------------
    Attachment: HIVE-17848.2.patch

> Bucket Map Join : Implement an efficient way to minimize loading hash table
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-17848
>                 URL: https://issues.apache.org/jira/browse/HIVE-17848
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Deepak Jaiswal
>            Assignee: Deepak Jaiswal
>         Attachments: HIVE-17848.2.patch
>
>
> In bucket mapjoin, each task loads its own copy of hash table which is inefficient as
load is IO heavy and due to multiple copies of same hash table, the tables may get GCed on
a busy system.
> Implement a subcache with softreference to each hash table corresponding to its bucketID
such that it can be reused by a task.
> This needs changes from Tez side to push bucket id to TezProcessor.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message