kylin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiaoyu Wang <wangxiao...@jd.com>
Subject Re: How to support Avro Complex Type on Kylin
Date Tue, 01 Dec 2015 08:12:12 GMT
Yes agree,the jira:https://issues.apache.org/jira/browse/KYLIN-1111
I will try do it and submit patch.

在 2015年12月01日 16:07, Shi, Shaofeng 写道:
> Kylin should automatically skip these complex columns, instead of blocking
> user from import the table, what do you think?
>
> On 12/1/15, 3:32 PM, "Xiaoyu Wang" <wangxiaoyu1@jd.com> wrote:
>
>> Yes You can create a hive view to remove the datatype array,map column.
>>
>> 在 2015年12月01日 15:26, Yiming Liu 写道:
>>> Thanks Xiaoyu, for the quick response.
>>>
>>>
>>> Currently, there is no way to remove those fields. The error happens on
>>> the first step "Sync Hive tables" when designing cube.
>>>
>>>
>>> I will redesign my original tables to fit the datatype requirement.
>>>
>>>
>>> ------------------ Original ------------------
>>> From:  "Xiaoyu Wang";<wangxiaoyu1@jd.com>;
>>> Date:  Tue, Dec 1, 2015 03:20 PM
>>> To:  "dev"<dev@kylin.incubator.apache.org>;
>>>
>>> Subject:  Re: How to support Avro Complex Type on Kylin
>>>
>>>
>>>
>>> Kylin does not support datatype like "array" "map".
>>> Can't set the array,map datatype column as dimension.
>>> You can remove the array,map column from cube design, and retry .
>>>
>>> 在 2015年12月01日 15:05, Yiming Liu 写道:
>>>> Hi Kylin expert,
>>>>
>>>> I have a table with avro encoding. It has map, array field type. I
>>>> could query the table on Hive.
>>>>
>>>> When I sync the table into Kylin, the Kylin says:
>>>> "bad data type -- array&lt;string&gt;, does not match
>>>> (any|char|varchar|boolean|binary|integer|tinyint|smallint|bigint|decimal
>>>> |numeric|float|real|double|date|time|datetime|timestamp|byte|int|short|l
>>>> ong|string|hllc|_literal_type|_derived_type)\s*(?:[(]([\d\s,]+)[)])?"
>>>>
>>>> So it seems Kylin does not support the avro complex type, is it right?
>>>> Do you have any suggestion how to process the complex data type.
>>>>
>>>> SerDe Library:	org.apache.hadoop.hive.serde2.avro.AvroSerDe	
>>>>
>>>> InputFormat:	org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat	
>>>>
>>>> OutputFormat:	org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputForma
>>>> t
>>>>
>>>> Following is my table schema:
>>>> 0		sessionid	string	
>>>> 1		userid	string	
>>>> 2		hosts	array<string>	
>>>> 3		domain	string	
>>>> 4		visittimes	int	
>>>> 5		firsttimestamp	bigint	
>>>> 6		lasttimestamp	bigint	
>>>> 7		sessiontimestamp	bigint	
>>>> 8		useragent	map<string,string>	
>>>> 9		srcaddrunsignedint	bigint	
>>>> 10		srcaddrstr	string	
>>>> 11		srcaddrcity	map<string,string>	
>>>> 12		srcaddrlocation	map<string,string>	
>>>> 13		destaddrunsignedint	bigint	
>>>> 14		destaddrstr	string	
>>>> 15		destaddrcity	map<string,string>	
>>>> 16		destaddrlocation	map<string,string>	
>>>> 17		keywords	map<string,array<string>>	
>>>> 18		topics	map<string,double>	
>>>> 19		cookies	map<string,string>	
>>>> 20		urls	array<string>	
>>>> 21		year	int	
>>>> 22		month	int	
>>>> 23		day	int	
>>>> 24		hour	int
>


Mime
View raw message