hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-13161) ORC: Always do sloppy overlaps for DiskRanges
Date Fri, 26 Feb 2016 00:36:18 GMT
Gopal V created HIVE-13161:
------------------------------

             Summary: ORC: Always do sloppy overlaps for DiskRanges
                 Key: HIVE-13161
                 URL: https://issues.apache.org/jira/browse/HIVE-13161
             Project: Hive
          Issue Type: Bug
    Affects Versions: 1.3.0, 2.1.0
            Reporter: Gopal V
            Assignee: Prasanth Jayachandran


The selected columns are sometimes only a few bytes apart (particularly for nulls which compresses
tightly) and the reads aren't merged 

The WORST_UNCOMPRESSED_SLOP is only applied in the PPD case and is applied more for safety
than reducing total number of round-trip calls to filesystem.

{code}
 /**
   * Update the disk ranges to collapse adjacent or overlapping ranges. It
   * assumes that the ranges are sorted.
   * @param ranges the list of disk ranges to merge
   */
  static void mergeDiskRanges(List<DiskRange> ranges) {
    DiskRange prev = null;
    for(int i=0; i < ranges.size(); ++i) {
      DiskRange current = ranges.get(i);
      if (prev != null && overlap(prev.offset, prev.end,
          current.offset, current.end)) {
        prev.offset = Math.min(prev.offset, current.offset);
        prev.end = Math.max(prev.end, current.end);
        ranges.remove(i);
        i -= 1;
      } else {
        prev = current;
      }
    }
  }
...
  private static boolean overlap(long leftA, long rightA, long leftB, long rightB) {
    if (leftA <= leftB) {
      return rightA >= leftB;
    }
    return rightB >= leftA;
  }

{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message