hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergio Pena" <sergio.p...@cloudera.com>
Subject Re: Review Request 32499: HIVE-10086: Hive throws error when accessing Parquet file schema using field name match
Date Thu, 26 Mar 2015 20:31:41 GMT


> On March 26, 2015, 7:11 p.m., Mohit Sabharwal wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java,
line 65
> > <https://reviews.apache.org/r/32499/diff/1/?file=906071#file906071line65>
> >
> >     why remove static ?

Thanks Mohit.
I did not know what's the benefit of 'private static' at the beginning, so I thought this
was just extra code.

But I know now that it has some benefits like guaranteeing that it does not touch instance
fields, and when functions are statically linked, then executing may be a litte faster.


> On March 26, 2015, 7:11 p.m., Mohit Sabharwal wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java,
line 90
> > <https://reviews.apache.org/r/32499/diff/1/?file=906071#file906071line90>
> >
> >     Looks like this method is called recursively (to deal with nested fields). Can
we have duplicate column names across nesting levels ?

Yes, parquet supports duplicate columns across nested levels.
So, this is an example:

optional group a {
  required binary name;
  optional group addr {
    optional binary a;
  }
}

optional group b {
  required binary name;
  optional group addr {
    optional binary b;
  }
}


- Sergio


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32499/#review77924
-----------------------------------------------------------


On March 25, 2015, 10:42 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32499/
> -----------------------------------------------------------
> 
> (Updated March 25, 2015, 10:42 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-10086
>     https://issues.apache.org/jira/browse/HIVE-10086
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Attached is the patch that handles schema that do not match between Parquet and Hive.
> 
> The access to Parquet data is with name matching in this case. The table column may have
different schema order, but if the name matches the parquet column name, then the value is
retrieved.
> 
> Also, if the Hive schema has columns and struct elements that do not match with the Parquet
schema, then it will return NULL values instead.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
57ae7a9740d55b407cadfc8bc030593b29f90700 
>   ql/src/test/queries/clientpositive/parquet_schema_evolution.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_table_with_subschema.q PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_schema_evolution.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_table_with_subschema.q.out PRE-CREATION

> 
> Diff: https://reviews.apache.org/r/32499/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message