hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <>
Subject [jira] [Commented] (HIVE-13040) Handle empty bucket creations more efficiently
Date Tue, 23 Feb 2016 17:24:18 GMT


Ashutosh Chauhan commented on HIVE-13040:

This patch addresses two distinct issues:
* Don't create empty buckets for Tez.  We know for sure Tez can handle missing bucket files
while doing BMJ & SMBJ. However, MR does explicitly checks for number of files before
attempting BMJ & SMBJ, so if we don't create empty files for MR, we risk running into
disabling BMJ & SMBJ later on for MR.
*  Above means, we do end up creating logically empty files for MR (which is majority of test
cases). For such cases, ORC currently writes header & footer. This patch includes a change
to not write anything at all (ie create 0-length file) in such cases. While reading there
are two ways to handle such 0-length files, either make ORC reader resilient to it or exclude
such files altogether while doing split generation. I choose second approach as thats more
efficient since we avoid wasteful processing for that. So, there are changes related to that
as well.

> Handle empty bucket creations more efficiently 
> -----------------------------------------------
>                 Key: HIVE-13040
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 1.0.0, 1.2.0, 1.1.0, 2.0.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>         Attachments: HIVE-13040.2.patch, HIVE-13040.3.patch, HIVE-13040.4.patch, HIVE-13040.5.patch,
HIVE-13040.6.patch, HIVE-13040.7.patch, HIVE-13040.patch

This message was sent by Atlassian JIRA

View raw message