kylin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shi, Shaofeng" <shao...@ebay.com>
Subject Re: How to support Avro Complex Type on Kylin
Date Tue, 01 Dec 2015 08:07:04 GMT
Kylin should automatically skip these complex columns, instead of blocking
user from import the table, what do you think?

On 12/1/15, 3:32 PM, "Xiaoyu Wang" <wangxiaoyu1@jd.com> wrote:

>Yes You can create a hive view to remove the datatype array,map column.
>
>在 2015年12月01日 15:26, Yiming Liu 写道:
>> Thanks Xiaoyu, for the quick response.
>>
>>
>> Currently, there is no way to remove those fields. The error happens on
>>the first step "Sync Hive tables" when designing cube.
>>
>>
>> I will redesign my original tables to fit the datatype requirement.
>>
>>
>> ------------------ Original ------------------
>> From:  "Xiaoyu Wang";<wangxiaoyu1@jd.com>;
>> Date:  Tue, Dec 1, 2015 03:20 PM
>> To:  "dev"<dev@kylin.incubator.apache.org>;
>>
>> Subject:  Re: How to support Avro Complex Type on Kylin
>>
>>
>>
>> Kylin does not support datatype like "array" "map".
>> Can't set the array,map datatype column as dimension.
>> You can remove the array,map column from cube design, and retry .
>>
>> 在 2015年12月01日 15:05, Yiming Liu 写道:
>>> Hi Kylin expert,
>>>
>>> I have a table with avro encoding. It has map, array field type. I
>>>could query the table on Hive.
>>>
>>> When I sync the table into Kylin, the Kylin says:
>>> "bad data type -- array&lt;string&gt;, does not match
>>>(any|char|varchar|boolean|binary|integer|tinyint|smallint|bigint|decimal
>>>|numeric|float|real|double|date|time|datetime|timestamp|byte|int|short|l
>>>ong|string|hllc|_literal_type|_derived_type)\s*(?:[(]([\d\s,]+)[)])?"
>>>
>>> So it seems Kylin does not support the avro complex type, is it right?
>>>Do you have any suggestion how to process the complex data type.
>>>
>>> SerDe Library:	org.apache.hadoop.hive.serde2.avro.AvroSerDe	
>>> 
>>>InputFormat:	org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat	
>>> 
>>>OutputFormat:	org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputForma
>>>t
>>>
>>> Following is my table schema:
>>> 0		sessionid	string	
>>> 1		userid	string	
>>> 2		hosts	array<string>	
>>> 3		domain	string	
>>> 4		visittimes	int	
>>> 5		firsttimestamp	bigint	
>>> 6		lasttimestamp	bigint	
>>> 7		sessiontimestamp	bigint	
>>> 8		useragent	map<string,string>	
>>> 9		srcaddrunsignedint	bigint	
>>> 10		srcaddrstr	string	
>>> 11		srcaddrcity	map<string,string>	
>>> 12		srcaddrlocation	map<string,string>	
>>> 13		destaddrunsignedint	bigint	
>>> 14		destaddrstr	string	
>>> 15		destaddrcity	map<string,string>	
>>> 16		destaddrlocation	map<string,string>	
>>> 17		keywords	map<string,array<string>>	
>>> 18		topics	map<string,double>	
>>> 19		cookies	map<string,string>	
>>> 20		urls	array<string>	
>>> 21		year	int	
>>> 22		month	int	
>>> 23		day	int	
>>> 24		hour	int


Mime
View raw message