hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <>
Subject [jira] [Created] (HIVE-11665) ORC StringDictionaryReader should not used Chunked buffers
Date Thu, 27 Aug 2015 06:41:45 GMT
Gopal V created HIVE-11665:

             Summary: ORC StringDictionaryReader should not used Chunked buffers
                 Key: HIVE-11665
             Project: Hive
          Issue Type: Improvement
          Components: File Formats
    Affects Versions: 1.3.0, 2.0.0
            Reporter: Gopal V
            Assignee: Prasanth Jayachandran

ORC String Dictionary Reader is slow due to the chunking of the input stream.

 private void readDictionaryStream(InStream in) throws IOException {
      if (in != null) { // Guard against empty dictionary stream.
        if (in.available() > 0) {
          dictionaryBuffer = new DynamicByteArray(64, in.available());
          // Since its start of strip invalidate the cache.
          dictionaryBufferInBytesCache = null;
      } else {
        dictionaryBuffer = null;

The fact that the data is chunked offers no advantage for the read-path where there is no
grow() operation for memory savings.

This message was sent by Atlassian JIRA

View raw message