hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Panagiotis Garefalakis (Jira)" <>
Subject [jira] [Updated] (HIVE-22731) Probe MapJoin hashtables for row level filtering
Date Wed, 22 Jan 2020 16:25:00 GMT


Panagiotis Garefalakis updated HIVE-22731:
    Status: Patch Available  (was: In Progress)

> Probe MapJoin hashtables for row level filtering
> ------------------------------------------------
>                 Key: HIVE-22731
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive, llap
>            Reporter: Panagiotis Garefalakis
>            Assignee: Panagiotis Garefalakis
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-22731.1.patch, HIVE-22731.2.patch, HIVE-22731.WIP.patch, decode_time_bars.pdf
>          Time Spent: 10m
>  Remaining Estimate: 0h
> Currently, RecordReaders such as ORC support filtering at coarser-grained levels, namely:
File, Stripe (64 to 256mb), and Row group (10k row) level. They only filter sets of rows
if they can guarantee that none of the rows can pass a filter (usually given as searchable
> However, a significant amount of time can be spend decoding rows with multiple columns
that are not even used in the final result. See figure where original is what happens today
and in LazyDecode we skip decoding rows that do not match the key.
> To enable a more fine-grained filtering in the particular case of a MapJoin we could
utilize the key HashTable created from the smaller table to skip deserializing row columns
at the larger table that do not match any key and thus save CPU time. 
> This Jira investigates this direction. 

This message was sent by Atlassian Jira

View raw message