drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Phillips (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-1997) Hive generated parquet files with maps containing strings return wrong value
Date Tue, 13 Jan 2015 21:51:34 GMT

    [ https://issues.apache.org/jira/browse/DRILL-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276011#comment-14276011
] 

Steven Phillips commented on DRILL-1997:
----------------------------------------

I actually think this is correct. The schema of the file:

message hive_schema {
  optional int32 c1;
  optional boolean c2;
  optional double c3;
  optional binary c4;
  optional group c5 (LIST) {
    repeated group bag {
      optional int32 array_element;
    }
  }
  optional group c6 (MAP) {
    repeated group map (MAP_KEY_VALUE) {
      required int32 key;
      optional binary value;
    }
  }
  optional group c7 (MAP) {
    repeated group map (MAP_KEY_VALUE) {
      required binary key;
      optional binary value;
    }
  }
  optional group c8 {
    optional binary r;
    optional int32 s;
    optional double t;
  }
  optional int32 c9;
  optional int32 c10;
  optional float c11;
  optional int64 c12;
  optional group c13 (LIST) {
    repeated group bag {
      optional group array_element (LIST) {
        repeated group bag {
          optional binary array_element;
        }
      }
    }
  }
  optional group c15 {
    optional int32 r;
    optional group s {
      optional int32 a;
      optional binary b;
    }
  }
  optional group c16 (LIST) {
    repeated group bag {
      optional group array_element {
        optional group m (MAP) {
          repeated group map (MAP_KEY_VALUE) {
            required binary key;
            optional binary value;
          }
        }
        optional int32 n;
      }
    }
  }
}

The string value in c6 is simply stored as binary, with no metadata indicating that it is
UTF-8 encoded string. I think this indicates that hive currently does not support the utf-8
converted type. In sqlline, when displaying a complex object, we use json. And binary values
are displayed as base64 in json.

> Hive generated parquet files with maps containing strings return wrong value
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-1997
>                 URL: https://issues.apache.org/jira/browse/DRILL-1997
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>            Reporter: Ramana Inukonda Nagaraj
>            Assignee: Parth Chandra
>            Priority: Critical
>         Attachments: hive_alltypes.parquet
>
>
> Created a parquet file in hive having the following DDL
> hive> desc alltypesparquet;          
> OK
> c1                  	int                 	                    
> c2                  	boolean             	                    
> c3                  	double              	                    
> c4                  	string              	                    
> c5                  	array<int>          	                    
> c6                  	map<int,string>     	                    
> c7                  	map<string,string>  	                    
> c8                  	struct<r:string,s:int,t:double>	                    
> c9                  	tinyint             	                    
> c10                 	smallint            	                    
> c11                 	float               	                    
> c12                 	bigint              	                    
> c13                 	array<array<string>>	                    
> c15                 	struct<r:int,s:struct<a:int,b:string>>	            
       
> c16                 	array<struct<m:map<string,string>,n:int>>	   
                
> Time taken: 0.076 seconds, Fetched: 15 row(s)
> All the complex types with string in them are returning incorrect values in drill. For
example:
> hive> select c6 from alltypesparquet;
> NULL
> NULL
> {1:"x",2:"y"}
> 0: jdbc:drill:> select c6 from `/user/hive/warehouse/alltypesparquet`;
> +------------+
> |     c6     |
> +------------+
> | {"map":[]} |
> | {"map":[]} |
> | {"map":[{"key":1,"value":"eA=="},{"key":2,"value":"eQ=="}]} |
> +------------+
> 3 rows selected (0.077 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message