hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nemon Lou (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-14143) RawDataSize of RCFile is zero after analyze
Date Sat, 02 Jul 2016 03:00:14 GMT

    [ https://issues.apache.org/jira/browse/HIVE-14143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359940#comment-15359940
] 

Nemon Lou commented on HIVE-14143:
----------------------------------

[~pxiong] Thanks for your attention.

RawDataSize for rcfile is a summary size of the total selected columns.
https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStructBase.java#L229
{code}
  public long getRawDataSerializedSize() {
    long serializedSize = 0;
    for (int i = 0; i < fieldInfoList.length; ++i) {
      serializedSize += fieldInfoList[i].getSerializedSize();
    }
    return serializedSize;
  }
{code}

During projections push down,READ_ALL_COLUMNS is always set to false,no matter the specified
columns are empty or not.
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L656
{code}
for (String alias : aliases) {
      Operator<? extends OperatorDesc> op = this.mrwork.getAliasToWork().get(
        alias);
      if (op instanceof TableScanOperator) {
        TableScanOperator ts = (TableScanOperator) op;
        // push down projections.
        ColumnProjectionUtils.appendReadColumns(
            jobConf, ts.getNeededColumnIDs(), ts.getNeededColumns());
        // push down filters
        pushFilters(jobConf, ts);

        AcidUtils.setTransactionalTableScan(job, ts.getConf().isAcidTable());
      }
    }
{code}
The specified column ids are empty for analyze,which means read all columns.

Finally, no column is read :
https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarSerDe.java#L104
{code}
List<Integer> notSkipIDs = new ArrayList<Integer>();
    if (conf == null || ColumnProjectionUtils.isReadAllColumns(conf)) {
      for (int i = 0; i < size; i++ ) {
        notSkipIDs.add(i);
      }
    } else {
      notSkipIDs = ColumnProjectionUtils.getReadColumnIDs(conf);
    }
    cachedLazyStruct = new ColumnarStruct(
        cachedObjectInspector, notSkipIDs, serdeParams.getNullSequence());
{code}

> RawDataSize of RCFile is zero after analyze 
> --------------------------------------------
>
>                 Key: HIVE-14143
>                 URL: https://issues.apache.org/jira/browse/HIVE-14143
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>    Affects Versions: 1.2.1, 2.1.0
>            Reporter: Nemon Lou
>            Assignee: Nemon Lou
>            Priority: Minor
>         Attachments: HIVE-14143.patch
>
>
> After running the following analyze command ,rawDataSize becomes zero for rcfile tables.
> {noformat}
>  analyze table RCFILE_TABLE compute statistics ;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message