hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashish Singh" <asi...@cloudera.com>
Subject Re: Review Request 28372: HIVE-8950: Add support in ParquetHiveSerde to create table schema from a parquet file
Date Wed, 24 Dec 2014 04:19:19 GMT


> On Dec. 13, 2014, 1:20 a.m., Ryan Blue wrote:
> > I don't know why I didn't catch this earlier, but I'd also like to see a unit test
for the new schema converter.

Added unit tests.


> On Dec. 13, 2014, 1:20 a.m., Ryan Blue wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java, line
141
> > <https://reviews.apache.org/r/28372/diff/8/?file=790874#file790874line141>
> >
> >     Why does this use a location property instead of the SD location?

tbl is just a Properties object, not actual Table object that has access to table's SD location.


> On Dec. 13, 2014, 1:20 a.m., Ryan Blue wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 692
> > <https://reviews.apache.org/r/28372/diff/8/?file=790875#file790875line692>
> >
> >     This appears to store the location in the storage descriptor, so I think that
the javadoc is no longer correct.

Removed


> On Dec. 13, 2014, 1:20 a.m., Ryan Blue wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java, line
138
> > <https://reviews.apache.org/r/28372/diff/8/?file=790874#file790874line138>
> >
> >     I'd rather not use parquet.file at all. Is there a compelling reason to keep
it now that this can find files in the table location?

parquet.file or a similar table property will be required to create a parquet backed managed
table. Hive expects atleast one column in the table, which can either come from user specified
hive schema or from table's serde.


> On Dec. 13, 2014, 1:20 a.m., Ryan Blue wrote:
> > metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java, line 2294
> > <https://reviews.apache.org/r/28372/diff/8/?file=790870#file790870line2294>
> >
> >     It isn't obvious why this is done or why deepCopy is used. Could you add a comment?

Done.


> On Dec. 13, 2014, 1:20 a.m., Ryan Blue wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ParquetToHiveSchemaConverter.java,
line 96
> > <https://reviews.apache.org/r/28372/diff/8/?file=790873#file790873line96>
> >
> >     I'm not sure we want to assume that an int96 is a timestamp. That's all it's
used for today, but maybe we should make it an opt-in so that users must set the hive type
to timestamp explicitly. That way, we keep from assuming that int96 will always be a timestamp
or building behavior we might have to break later.

User can make hive assume parquet's int96 to map to hive's timestamp by setting "parquet.int96.is.timestamp"
in table properties.


- Ashish


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28372/#review65004
-----------------------------------------------------------


On Dec. 13, 2014, 12:03 a.m., Ashish Singh wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28372/
> -----------------------------------------------------------
> 
> (Updated Dec. 13, 2014, 12:03 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-8950
>     https://issues.apache.org/jira/browse/HIVE-8950
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-8950: Add support in ParquetHiveSerde to create table schema from a parquet file
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fafd78e63e9b41c9fdb0e017b567dc719d151784

>   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 32186391e7e4cfc9b4d06d7376663e82ec08d9e6

>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java b137fcb86e27ac91ed3c733b4d8788228d379a09

>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 2db2658fbc57fba01c892c9213baef6c498e659b

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ParquetSchemaReader.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ParquetToHiveSchemaConverter.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 4effe736fcf9d3715f03eed9885c299a7aa040dd

>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java cd3d349e8bd8785d6cadaf9ed8fa7598f223774a

>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetToHiveSchemaConverter.java
PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_array_of_multi_field_struct_gen_schema.q
PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_array_of_optional_elements_gen_schema.q
PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_array_of_required_elements_gen_schema.q
PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_array_of_single_field_struct_gen_schema.q
PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_array_of_structs_gen_schema.q PRE-CREATION

>   ql/src/test/queries/clientpositive/parquet_array_of_structs_gen_schema_ext.q PRE-CREATION

>   ql/src/test/queries/clientpositive/parquet_array_of_unannotated_groups_gen_schema.q
PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_array_of_unannotated_primitives_gen_schema.q
PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_avro_array_of_primitives_gen_schema.q PRE-CREATION

>   ql/src/test/queries/clientpositive/parquet_avro_array_of_single_field_struct_gen_schema.q
PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_decimal_gen_schema.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_thrift_array_of_primitives_gen_schema.q
PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_thrift_array_of_single_field_struct_gen_schema.q
PRE-CREATION 
>   ql/src/test/results/clientpositive/create_view_partitioned.q.out ebf9a6bc4f2321d7f539b7a445b3f279e3285b8a

>   ql/src/test/results/clientpositive/parquet_array_of_multi_field_struct_gen_schema.q.out
PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_array_of_optional_elements_gen_schema.q.out
PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_array_of_required_elements_gen_schema.q.out
PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_array_of_single_field_struct_gen_schema.q.out
PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_array_of_structs_gen_schema.q.out PRE-CREATION

>   ql/src/test/results/clientpositive/parquet_array_of_structs_gen_schema_ext.q.out PRE-CREATION

>   ql/src/test/results/clientpositive/parquet_array_of_unannotated_groups_gen_schema.q.out
PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_array_of_unannotated_primitives_gen_schema.q.out
PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_avro_array_of_primitives_gen_schema.q.out
PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_avro_array_of_single_field_struct_gen_schema.q.out
PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_decimal_gen_schema.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_thrift_array_of_primitives_gen_schema.q.out
PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_thrift_array_of_single_field_struct_gen_schema.q.out
PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/28372/diff/
> 
> 
> Testing
> -------
> 
> Tested by adding appropriate qTests.
> 
> 
> Thanks,
> 
> Ashish Singh
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message