hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Hanson" <eh...@microsoft.com>
Subject Re: Review Request: Change ORC tree readers to return batches of rows instead of a row
Date Wed, 24 Apr 2013 23:01:50 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10712/#review19674
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
<https://reviews.apache.org/r/10712/#comment40621>

    I recommend this method take and return a ColumnVector instead of an Object since I don't
think it would every make sense to note take a ColumnVector subtype
    
    this applies to all nextVector methods



ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
<https://reviews.apache.org/r/10712/#comment40622>

    add blank line before comment



ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
<https://reviews.apache.org/r/10712/#comment40623>

    "".getBytes() is probably going to new() a byte array every time you call it.
    
    For better performance, create a static final class variable that is an empty byte array,
or just use dictionaryBytes, 0, 0
    



ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
<https://reviews.apache.org/r/10712/#comment40624>

    The plan was to not support struct yet, but later, to support a field of a struct just
like it was a regular column. STruct field access would just be a naming convention.
    
    A query might not access every field of a struct. This reads every field of the struct.
    
    I think probably we should leave this unimplemented and then come back and do it later
using the naming-convention technique.



ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
<https://reviews.apache.org/r/10712/#comment40625>

    put a javadoc comment describing method



ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
<https://reviews.apache.org/r/10712/#comment40626>

    I don't understand this. map and struct are not supported yet, so I think this should
be unimplemented.



ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java
<https://reviews.apache.org/r/10712/#comment40627>

    if there are no nulls in a strip or split for a column, we should be able to do a fast
code path that doesn't need this check and if-else
    
    I haven't see noNulls get set anywhere. What is the plan for setting noNulls as an optimization?
That has a big performance impact in QE (about 30% time savings for filters and arithmetic)
    


- Eric Hanson


On April 24, 2013, 9:53 p.m., Sarvesh Sakalanaga wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10712/
> -----------------------------------------------------------
> 
> (Updated April 24, 2013, 9:53 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Description
> -------
> 
> The patch contains changes to ORC reader to return a batch of rows instead of a row.
A new method called nextBatch() is added to ORC reader and tree readers of ORC. Currently
only int,long,short,double,float,string and struct support batch processing.
> 
> 
> This addresses bug HIVE-4370.
>     https://issues.apache.org/jira/browse/HIVE-4370
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java 246170d 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java fc4e53b 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReader.java 05240ce 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java d044cd8 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java 2825c64 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedORCReader.java PRE-CREATION

> 
> Diff: https://reviews.apache.org/r/10712/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sarvesh Sakalanaga
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message