hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Review Request 14162: HIVE-4340: ORC should provide raw data size
Date Mon, 16 Sep 2013 21:29:29 GMT

This is an automatically generated e-mail. To reply, visit:

Review request for hive, Ashutosh Chauhan and Owen O'Malley.

Bugs: HIVE-4340

Repository: hive-git


ORC's SerDe currently does nothing, and hence does not calculate a raw data size.  WriterImpl,
however, has enough information to provide one.

WriterImpl should compute a raw data size for each row, aggregate them per stripe and record
it in the strip information, as RC currently does in its key header, and allow the FileSinkOperator
access to the size per row.

FileSinkOperator should be able to get the raw data size from either the SerDe or the RecordWriter
when the RecordWriter can provide it.


  ql/src/java/org/apache/hadoop/hive/ql/exec/ bcee201 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ 6268617 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ c80fb02 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ 90260fd 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ c454f32 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ 72e779a 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ 8e74b91 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ 44961ce 
  ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto edbf822 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/ e6569f4 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/ b93db84 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/ PRE-CREATION 
  ql/src/test/resources/orc-file-dump-dictionary-threshold.out 003c132 
  ql/src/test/resources/orc-file-dump.out fac5326 
  serde/src/java/org/apache/hadoop/hive/serde2/ 1c09dc3 



All unit tests and q file tests related to ORC are passing.



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message