Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0DB5F18869 for ; Fri, 26 Feb 2016 00:36:19 +0000 (UTC) Received: (qmail 3293 invoked by uid 500); 26 Feb 2016 00:36:18 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 3163 invoked by uid 500); 26 Feb 2016 00:36:18 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 2798 invoked by uid 99); 26 Feb 2016 00:36:18 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Feb 2016 00:36:18 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 352A72C1F68 for ; Fri, 26 Feb 2016 00:36:18 +0000 (UTC) Date: Fri, 26 Feb 2016 00:36:18 +0000 (UTC) From: "Gopal V (JIRA)" To: dev@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HIVE-13161) ORC: Always do sloppy overlaps for DiskRanges MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Gopal V created HIVE-13161: ------------------------------ Summary: ORC: Always do sloppy overlaps for DiskRanges Key: HIVE-13161 URL: https://issues.apache.org/jira/browse/HIVE-13161 Project: Hive Issue Type: Bug Affects Versions: 1.3.0, 2.1.0 Reporter: Gopal V Assignee: Prasanth Jayachandran The selected columns are sometimes only a few bytes apart (particularly for nulls which compresses tightly) and the reads aren't merged The WORST_UNCOMPRESSED_SLOP is only applied in the PPD case and is applied more for safety than reducing total number of round-trip calls to filesystem. {code} /** * Update the disk ranges to collapse adjacent or overlapping ranges. It * assumes that the ranges are sorted. * @param ranges the list of disk ranges to merge */ static void mergeDiskRanges(List ranges) { DiskRange prev = null; for(int i=0; i < ranges.size(); ++i) { DiskRange current = ranges.get(i); if (prev != null && overlap(prev.offset, prev.end, current.offset, current.end)) { prev.offset = Math.min(prev.offset, current.offset); prev.end = Math.max(prev.end, current.end); ranges.remove(i); i -= 1; } else { prev = current; } } } ... private static boolean overlap(long leftA, long rightA, long leftB, long rightB) { if (leftA <= leftB) { return rightA >= leftB; } return rightB >= leftA; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)