hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From j.prasant...@gmail.com
Subject Re: Review Request 37985: HIVE-11705 refactor SARG stripe filtering for ORC into a method
Date Sat, 05 Sep 2015 01:59:39 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37985/#review97837
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java (line 272)
<https://reviews.apache.org/r/37985/#comment153887>

    Who calls this method? If this is required later can you put this in that patch. Focus
only on refactoring in this patch. Easy to review too :)



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java (line 275)
<https://reviews.apache.org/r/37985/#comment153888>

    Split into two assignments?



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java (line 284)
<https://reviews.apache.org/r/37985/#comment153885>

    Split statement?



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java (line 348)
<https://reviews.apache.org/r/37985/#comment153877>

    nit: rename to neededColumnNames?



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java (line 362)
<https://reviews.apache.org/r/37985/#comment153880>

    I think this is messed up from the beginning. The second argument is list of all projected
column names and not just sarg column names. So rename of this method will be helpful.



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java (line 378)
<https://reviews.apache.org/r/37985/#comment153881>

    Same naming confusion here.



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java (line 388)
<https://reviews.apache.org/r/37985/#comment153875>

    This function name is misleading. This config represents the columns that are needed to
be read not necessarily the column names in the SARG. SARG columns will be subset of needed
columns.



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java (line 392)
<https://reviews.apache.org/r/37985/#comment153876>

    Similarly as above. None of these methods gets the sarg column names instead it gets the
included column names.



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java (line 1388)
<https://reviews.apache.org/r/37985/#comment153882>

    This is the actual internal column index corresponding to SARG columns. Rename suggestion
again. mapSargColumnsToInternalColIdx()?



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java (line 1393)
<https://reviews.apache.org/r/37985/#comment153893>

    where is this used? I don't see it being used anywhere. Can this be moved to relevant
jira?



ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java (line 1228)
<https://reviews.apache.org/r/37985/#comment153889>

    indexInSourceTable will never be null. Right?



ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java (line 1231)
<https://reviews.apache.org/r/37985/#comment153894>

    Whats this translation doing exactly?
    Is this trying to map between sarg column -> table column index -> orc internal
column index?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java (line 517)
<https://reviews.apache.org/r/37985/#comment153884>

    Create a bug?



serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java (line 146)
<https://reviews.apache.org/r/37985/#comment153883>

    I also don't understand why would it contain duplicates. IIRC, this was probably caused
by multiple concatenation to the READ_COLUMN_IDS_CONF_STR. I am not sure if this happens anymore,
in any case we should create a bug and remove this code. Or may be remove in the next patch.
Also use, guava Splitter.splitToList?



storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentFactory.java (line 28)
<https://reviews.apache.org/r/37985/#comment153895>

    I would prefer setting new name based on internal column names. 
    For example:
    Schema from filedump: struct<_col0:tinyint,_col1:smallint>
    
    Current SARG: leaf-0 = (EQUALS i1 100)
    
    After this patch: leaf-0 = (EQUALS _col1 100)
    
    If we can map to internal names, the it will be easy to map sarg column names to internal
column index. i1 -> 1 (after ripping off _col)


- Prasanth_J


On Sept. 2, 2015, 1:06 a.m., Sergey Shelukhin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37985/
> -----------------------------------------------------------
> 
> (Updated Sept. 2, 2015, 1:06 a.m.)
> 
> 
> Review request for hive and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> see jira
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 05efc5f 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java 8beff4b 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java fcb3746 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java 4480600 
>   ql/src/java/org/apache/hadoop/hive/ql/io/sarg/ConvertAstToSearchArg.java e034650 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 2dc15f9

>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b809a23 
>   serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 10086c5 
>   storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentFactory.java 0778935

>   storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java d27ac16

> 
> Diff: https://reviews.apache.org/r/37985/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message