hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From j.prasant...@gmail.com
Subject Re: Review Request 14243: HIVE-5325: Implement statistics providing ORC writer and reader interfaces
Date Tue, 01 Oct 2013 01:12:19 GMT


> On Sept. 30, 2013, 4:16 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java, line 360
> > <https://reviews.apache.org/r/14243/diff/2/?file=356668#file356668line360>
> >
> >     Can you add a comment when this if condition will be true and when it will be
false. It isn't obvious.

Done.


> On Sept. 30, 2013, 4:16 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java, line 1794
> > <https://reviews.apache.org/r/14243/diff/2/?file=356670#file356670line1794>
> >
> >     Log a message here, saying unknown category.

Done.


> On Sept. 30, 2013, 4:16 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java, line 1806
> > <https://reviews.apache.org/r/14243/diff/2/?file=356670#file356670line1806>
> >
> >     What is more useful is how much size these objects will take when loaded back
in memory (e.g while doing map-joins). What you have is how much memory they take up while
ORC writer has buffered them in memory. So, what we want is numVals * sizeof(boolean)

Makes sense. Updated to use JavaDataModel. 


> On Sept. 30, 2013, 4:16 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java, line 1868
> > <https://reviews.apache.org/r/14243/diff/2/?file=356670#file356670line1868>
> >
> >     Log a msg here too.

Done.


> On Sept. 30, 2013, 4:16 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java, line 23
> > <https://reviews.apache.org/r/14243/diff/2/?file=356671#file356671line23>
> >
> >     Class JavaDataModel has almost identical info. Consider using that.

Removed StatsUtils as it is no longer required.


> On Sept. 30, 2013, 4:16 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java, line 264
> > <https://reviews.apache.org/r/14243/diff/2/?file=356674#file356674line264>
> >
> >     You have removed this test. Is this getting covered via new one?

I wrote that test case earlier to make sure old ORC format (v0.11 format) can be read without
issues. For that new orc file written in old format was added to test resources  which will
be read by a unit test. The write part of the unit test case is no longer required since we
directly have old format orc file in resources. Hence I removed it. 


- Prasanth_J


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14243/#review26490
-----------------------------------------------------------


On Sept. 24, 2013, 10:18 p.m., Prasanth_J wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14243/
> -----------------------------------------------------------
> 
> (Updated Sept. 24, 2013, 10:18 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Owen O'Malley.
> 
> 
> Bugs: HIVE-5325
>     https://issues.apache.org/jira/browse/HIVE-5325
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-5324 adds new interfaces that can be implemented by ORC reader/writer to provide
statistics. Writer provided statistics is used to update table/partition level statistics
in metastore. Reader provided statistics can be used for reducer estimation, CBO etc. in the
absence of metastore statistics.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/BinaryColumnStatistics.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java 6268617 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java c80fb02 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java c454f32 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java 72e779a 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java 44961ce 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java PRE-CREATION 
>   ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto edbf822 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 34b2305 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java e6569f4 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java b93db84 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSerDeStats.java PRE-CREATION 
>   ql/src/test/resources/orc-file-dump-dictionary-threshold.out 003c132 
>   ql/src/test/resources/orc-file-dump.out fac5326 
> 
> Diff: https://reviews.apache.org/r/14243/diff/
> 
> 
> Testing
> -------
> 
> ORC related unit and qfile tests are passing.
> 
> 
> Thanks,
> 
> Prasanth_J
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message