hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashish Singh" <asi...@cloudera.com>
Subject Re: Review Request 28372: HIVE-8950: Add support in ParquetHiveSerde to create table schema from a parquet file
Date Thu, 27 Nov 2014 01:19:13 GMT


> On Nov. 25, 2014, 12:37 a.m., Sergio Pena wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ColInfoFromParquetFile.java,
line 35
> > <https://reviews.apache.org/r/28372/diff/1/?file=773791#file773791line35>
> >
> >     This class will need more work in order to detect unannotated types as specified
in the following tickets:
> >     https://issues.apache.org/jira/browse/HIVE-8909
> >     https://github.com/rdblue/incubator-parquet-format/blob/PARQUET-113-add-list-and-map-spec/LogicalTypes.md
> >     
> >     I was going to add more comments, but I then noticed that this code will look
as little similar (for loops, and converters) to the HIVE-8909 patch. Of course, the HIVE-8909
patch returns converter objects, and this returns the column names & types. So, I was
thinking if we can make use of the DataWritableRecordConverter.java class to get the correct
converters, and then translate the converters to column names & types. 
> >     
> >     This is what I found while debugging valid parquet files:
> >     Each example has 3 blocks:
> >     - parquet file schema
> >     - hive columne names & types
> >     - converters returned by DataWritableRecordConverter
> >     
> >     Could we use the converter objects and translate them to names? 
> >     
> >     message SingleFieldGroupInList {
> >       optional group single_element_groups (LIST) {
> >         repeated group single_element_group {
> >           required int64 count;
> >         }
> >       }
> >     }
> >     
> >     single_element_groups ARRAY<BIGINT>
> >     
> >     	hivestructconverter                             
> >     		hivecollectionconverter:elementconverter	array<>
> >     			EINT64_CONVERTER							bigint
> >     			
> >     --------------------------------------------------------------------------------
> >     
> >     message HiveRequiredGroupInList {
> >       optional group locations (LIST) {
> >         repeated group bag {
> >           required group element {
> >             required double latitude;
> >             required double longitude;
> >           }
> >         }
> >       }
> >     }
> >     			
> >     locations ARRAY<STRUCT<latitude: DOUBLE, longitude: DOUBLE>>
> >     
> >     	hivestructconverter
> >     		hivecollectionconverter:elementconverter	array<>
> >     			hivestructconverter							struct<>
> >     				DOUBLE_CONVERTER							double
> >     				DOUBLE_CONVERTER							double
> >     				
> >     --------------------------------------------------------------------------------
			
> >     			
> >     message UnannotatedListOfPrimitives {
> >       repeated int32 list_of_ints;
> >     }
> >     	
> >     list_of_ints ARRAY<INT>
> >     
> >     	hivestructconverter
> >     		RepeatedPrimitiveConverter						array<>
> >     			EINT32_CONVERTER								int
> >     			
> >     --------------------------------------------------------------------------------

Sergio, thanks for the review and valuable comments!

DataWritableRecordConverter expects hive schema. So, not sure if that can be used to create
hive schema. I have updated the patch to take care of rules listed on "https://github.com/rdblue/incubator-parquet-format/blob/PARQUET-113-add-list-and-map-spec/LogicalTypes.md".
Kindly take a look.


- Ashish


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28372/#review62911
-----------------------------------------------------------


On Nov. 27, 2014, 1:08 a.m., Ashish Singh wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28372/
> -----------------------------------------------------------
> 
> (Updated Nov. 27, 2014, 1:08 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-8950
>     https://issues.apache.org/jira/browse/HIVE-8950
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-8950: Add support in ParquetHiveSerde to create table schema from a parquet file
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fafd78e63e9b41c9fdb0e017b567dc719d151784

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ColInfoFromParquetFile.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 4effe736fcf9d3715f03eed9885c299a7aa040dd

>   ql/src/test/queries/clientpositive/parquet_array_of_multi_field_struct_gen_schema.q
PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_array_of_optional_elements_gen_schema.q
PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_array_of_required_elements_gen_schema.q
PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_array_of_single_field_struct_gen_schema.q
PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_array_of_structs_gen_schema.q PRE-CREATION

>   ql/src/test/queries/clientpositive/parquet_array_of_unannotated_groups_gen_schema.q
PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_array_of_unannotated_primitives_gen_schema.q
PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_avro_array_of_primitives_gen_schema.q PRE-CREATION

>   ql/src/test/queries/clientpositive/parquet_avro_array_of_single_field_struct_gen_schema.q
PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_decimal_gen_schema.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_thrift_array_of_primitives_gen_schema.q
PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_thrift_array_of_single_field_struct_gen_schema.q
PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_array_of_multi_field_struct_gen_schema.q.out
PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_array_of_optional_elements_gen_schema.q.out
PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_array_of_required_elements_gen_schema.q.out
PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_array_of_single_field_struct_gen_schema.q.out
PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_array_of_structs_gen_schema.q.out PRE-CREATION

>   ql/src/test/results/clientpositive/parquet_array_of_unannotated_groups_gen_schema.q.out
PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_array_of_unannotated_primitives_gen_schema.q.out
PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_avro_array_of_primitives_gen_schema.q.out
PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_avro_array_of_single_field_struct_gen_schema.q.out
PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_decimal_gen_schema.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_thrift_array_of_primitives_gen_schema.q.out
PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_thrift_array_of_single_field_struct_gen_schema.q.out
PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/28372/diff/
> 
> 
> Testing
> -------
> 
> Tested by adding appropriate qTests.
> 
> 
> Thanks,
> 
> Ashish Singh
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message