hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Blue" <b...@apache.org>
Subject Re: Review Request 32499: HIVE-10086: Hive throws error when accessing Parquet file schema using field name match
Date Thu, 26 Mar 2015 18:17:30 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32499/#review77918
-----------------------------------------------------------

Ship it!


I found a few minor things, but this all looks correct to me. Nice job!


ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
<https://reviews.apache.org/r/32499/#comment126257>

    Minor: I initially wondered why you didn't use `GroupType#getType(String fieldName)` but
I see you're using `equalsIgnoreCase`. This would be more clear if you used a helper method,
`getFieldIgnoreCase(String fieldName)`.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
<https://reviews.apache.org/r/32499/#comment126258>

    Rather than cast, use `fieldType.asGroupType()` because it will throw a consistent error
message.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
<https://reviews.apache.org/r/32499/#comment126256>

    This is a great java trick I didn't know about to get an each-with-index method!
    
    Minor: Is there a better name than `i$` though? Typically names with dollar-signs signal
generated code to me. What about `iterWithIndex` or something a bit more readable?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
<https://reviews.apache.org/r/32499/#comment126249>

    I recommend not using the Type constructors because I'd like to deprecate them in the
near future. This could be:
    
    ```java
    schemaTypes.add(Types.optional(BINARY).named(colName));
    ```
    
    There are other places this could be done, too, but it's up to you whether to file a follow-up
or fix it now.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
<https://reviews.apache.org/r/32499/#comment126250>

    Not a blocker, but I'd really like to know what this branching indicates about the table.
Missing DDL?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
<https://reviews.apache.org/r/32499/#comment126253>

    This gets the columns without changing the order, and the selected columns are the first
N where N is the size of the list of names. So the only effect of this line is to shorten
the schema to just what is defined in the table? In that case is it necessary to do this or
can we just pass the table schema to the projection call later? Assuming the projected ids
are always `< columnNamesList.size()` then it should do the same thing.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
<https://reviews.apache.org/r/32499/#comment126251>

    This isn't a blocker, but I find it odd that the "HIVE_TABLE_SCHEMA" isn't a Hive schema.
It's a Parquet schema. It might be too late to rename the constant's value, but renaming the
variable might help readability.



ql/src/test/queries/clientpositive/parquet_table_with_subschema.q
<https://reviews.apache.org/r/32499/#comment126261>

    Could you reverse the order of zip and Street? I want to make sure reordering is specifically
tested.


- Ryan Blue


On March 25, 2015, 3:42 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32499/
> -----------------------------------------------------------
> 
> (Updated March 25, 2015, 3:42 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-10086
>     https://issues.apache.org/jira/browse/HIVE-10086
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Attached is the patch that handles schema that do not match between Parquet and Hive.
> 
> The access to Parquet data is with name matching in this case. The table column may have
different schema order, but if the name matches the parquet column name, then the value is
retrieved.
> 
> Also, if the Hive schema has columns and struct elements that do not match with the Parquet
schema, then it will return NULL values instead.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
57ae7a9740d55b407cadfc8bc030593b29f90700 
>   ql/src/test/queries/clientpositive/parquet_schema_evolution.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_table_with_subschema.q PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_schema_evolution.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_table_with_subschema.q.out PRE-CREATION

> 
> Diff: https://reviews.apache.org/r/32499/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message