hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carl Steinbach (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-3746) TRowSet resultset structure should be column-oriented
Date Mon, 26 Nov 2012 21:14:58 GMT

    [ https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504103#comment-13504103
] 

Carl Steinbach commented on HIVE-3746:
--------------------------------------

Currently HS2 uses the following Thrift structures to represent a resultset:

{noformat}
// Represents a rowset
struct TRowSet {
  // The starting row offset of this rowset.
  1: required i64 startRowOffset
  2: required list<TRow> rows
}

// Represents a row in a rowset.
struct TRow {
  1: required list<TColumnValue> colVals
}

union TColumnValue {
  1: TBoolValue   boolVal      // BOOLEAN
  2: TByteValue   byteVal      // TINYINT
  3: TI16Value    i16Val       // SMALLINT
  4: TI32Value    i32Val       // INT
  5: TI64Value    i64Val       // BIGINT, TIMESTAMP
  6: TDoubleValue doubleVal    // FLOAT, DOUBLE
  7: TStringValue stringVal    // STRING, LIST, MAP, STRUCT, UNIONTYPE, BINARY
}

// A Boolean column value.
struct TBoolValue {
  // NULL if value is unset.
  1: optional bool value
}

...

struct TStringValue {
  1: optional string value
}
{noformat}

This problem with this approach is that Thrift unions are not very efficient, and we pay this
cost on a per-field basis. Instead, we should make the result set structure column-oriented
as follows:

{noformat}
// Represents a rowset
struct TRowSet {
  // The starting row offset of this rowset.
  1: required i64 startRowOffset
  2: required list<TColumn> columns
}

union TColumn {
  1: list<TBoolValue> boolColumn
  2: list<TByteValue> byteColumn
  3: list<TI16Value> i16Column
  4: list<TI32Value> i32Column
  5: list<TI64Value> i64Column
  6: list<TDoubleValue> doubleColumn
  7: list<TStringValue> stringColumn
}
{noformat}


                
> TRowSet resultset structure should be column-oriented
> -----------------------------------------------------
>
>                 Key: HIVE-3746
>                 URL: https://issues.apache.org/jira/browse/HIVE-3746
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Server Infrastructure
>            Reporter: Carl Steinbach
>            Assignee: Carl Steinbach
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message