hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleksandr Diachenko <odiache...@pivotal.io>
Subject [DISCUSS] Hive Complex types in \d
Date Fri, 22 Apr 2016 21:18:02 GMT
Hi,

I am working on https://issues.apache.org/jira/browse/HAWQ-703 and wanted
to get some opinions.

As for now HAWQ supports \d for Hive tables with primitive types only(int,
boolean, date etc).
If user wants to describe Hive table which has some complex types(array,
map, struct, uniontype) error occurs.

*Example:*

User have Hive table:

*hive> describe hive_small_data;*

*OK*

*s1                  string                                  *

*s2                  string                                  *

*n1                  int                                     *

*d1                  double*
User is able to describe this table in psql:

*# \d hcatalog.default.hive_small_data*

*PXF Hive Table "default.hive_small_data"*

* Column |  Type  *

*--------+--------*

* s1     | text*

* s2     | text*

* n1     | int4*

* d1     | float8*
Thus psql shows column and types mapped to HAWQ's types(string -> text, int
-> int4).

The goal in HAWQ-703 - to be able to query Hive tables with complex
columns, represent those complex types as TEXT. Long term goal will be
mapping HIve's complex types to HAWQ's types when applicable.

*Disclaimer*:

Proposed changes will affect only Hive tables, behavior for all other
objects will remain the same.

*The problems:*

   1. Not all Hive complex types could be mapped to HAWQ types. For example
   - uniontype, there is no corresponding HAWQ type.
   2. Assuming fact that we do mapping Hive types to HAWQ types, displaying
   only HAWQ type in \d might be not enough for user and even confusing.


*Possible options:*

   1. \d behavior for Hive tables remains the same, column Type shows HAWQ
   type, \d+ shows additional column, Source Type, where Type=HAWQ type,
   Source Type=Hive type.
      -

      *# \d hcatalog.default.reg_collections;*

      *PXF Hive Table "default.reg_collections"*

      * Column |  Type  *

      *------------+--------*

      * s1          | text*

      * f1           | float4*

      * a1          | text*

      * m1         | text*

      * sr1         | text*
      -

      *# \d+ hcatalog.default.reg_collections;*

      *PXF Hive Table "default.reg_collections"*

      * Column |  Type  | Source Type                          |*

      *------------+---------+--------------------------------------+*

      * s1          | text     |  string
       |*

      * f1           | float4  |  float
       |*

      * a1          | text     |  array<string>                        |*

      * m1         | text     |  map<strung, float>                |*

      * sr1         | text     |  struct<a:string,b:string,c:int> |*
      2. \d shows three columns (Column, Type, Source Type), where
   Type=HAWQ type, Source Type=Hive type.
      -

      *# \d+ hcatalog.default.reg_collections;*

      *PXF Hive Table "default.reg_collections"*

      * Column |  Type  | Source Type                          |*

      *------------+---------+--------------------------------------+*

      * s1          | text     |  string
        |*

      * f1           | float4  |  float
        |*

      * a1          | text     |  array<string>                         |*

      * m1         | text     |  map<strung, float>                |*

      * sr1         | text     |  struct<a:string,b:string,c:int> |*
      3. \d shows Column, Type, where Type=Hive type.


   -

      *# \d hcatalog.default.reg_collections;*

      *PXF Hive Table "default.reg_collections"*

      * Column |    Type                                   |*

      *------------+--------------------------------------+*

      * s1          | string                                     |*

      * f1           | float                                       |*

      * a1          | array<string>                         |*

      * m1         | map<strung, float>                 |*

      * sr1         |  struct<a:string,b:string,c:int> |*

I would prefer option 1.
Any thoughts/opinions?


Regards, Alex.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message