hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From j.prasant...@gmail.com
Subject Re: Review Request 14162: HIVE-4340: ORC should provide raw data size
Date Mon, 16 Sep 2013 22:10:58 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14162/
-----------------------------------------------------------

(Updated Sept. 16, 2013, 10:10 p.m.)


Review request for hive, Ashutosh Chauhan and Owen O'Malley.


Changes
-------

added UNION case to ORC writer raw data size computation.


Bugs: HIVE-4340
    https://issues.apache.org/jira/browse/HIVE-4340


Repository: hive-git


Description
-------

ORC's SerDe currently does nothing, and hence does not calculate a raw data size.  WriterImpl,
however, has enough information to provide one.

WriterImpl should compute a raw data size for each row, aggregate them per stripe and record
it in the strip information, as RC currently does in its key header, and allow the FileSinkOperator
access to the size per row.

FileSinkOperator should be able to get the raw data size from either the SerDe or the RecordWriter
when the RecordWriter can provide it.


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java bcee201 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/BinaryColumnStatistics.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java 6268617 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java c80fb02 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java 90260fd 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java c454f32 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java 72e779a 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java 8e74b91 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java 44961ce 
  ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto edbf822 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java e6569f4 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java b93db84 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSerDeStats.java PRE-CREATION 
  ql/src/test/resources/orc-file-dump-dictionary-threshold.out 003c132 
  ql/src/test/resources/orc-file-dump.out fac5326 
  serde/src/java/org/apache/hadoop/hive/serde2/SerDeStats.java 1c09dc3 

Diff: https://reviews.apache.org/r/14162/diff/


Testing
-------

All unit tests and q file tests related to ORC are passing.


Thanks,

Prasanth_J


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message