kylin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hongbin ma <mahong...@apache.org>
Subject Re: How to support Avro Complex Type on Kylin
Date Tue, 01 Dec 2015 08:12:43 GMT
​@shaofeng, it looks like a nice feature.​

On Tue, Dec 1, 2015 at 4:07 PM, Shi, Shaofeng <shaoshi@ebay.com> wrote:

> Kylin should automatically skip these complex columns, instead of blocking
> user from import the table, what do you think?
>
> On 12/1/15, 3:32 PM, "Xiaoyu Wang" <wangxiaoyu1@jd.com> wrote:
>
> >Yes You can create a hive view to remove the datatype array,map column.
> >
> >在 2015年12月01日 15:26, Yiming Liu 写道:
> >> Thanks Xiaoyu, for the quick response.
> >>
> >>
> >> Currently, there is no way to remove those fields. The error happens on
> >>the first step "Sync Hive tables" when designing cube.
> >>
> >>
> >> I will redesign my original tables to fit the datatype requirement.
> >>
> >>
> >> ------------------ Original ------------------
> >> From:  "Xiaoyu Wang";<wangxiaoyu1@jd.com>;
> >> Date:  Tue, Dec 1, 2015 03:20 PM
> >> To:  "dev"<dev@kylin.incubator.apache.org>;
> >>
> >> Subject:  Re: How to support Avro Complex Type on Kylin
> >>
> >>
> >>
> >> Kylin does not support datatype like "array" "map".
> >> Can't set the array,map datatype column as dimension.
> >> You can remove the array,map column from cube design, and retry .
> >>
> >> 在 2015年12月01日 15:05, Yiming Liu 写道:
> >>> Hi Kylin expert,
> >>>
> >>> I have a table with avro encoding. It has map, array field type. I
> >>>could query the table on Hive.
> >>>
> >>> When I sync the table into Kylin, the Kylin says:
> >>> "bad data type -- array&lt;string&gt;, does not match
> >>>(any|char|varchar|boolean|binary|integer|tinyint|smallint|bigint|decimal
> >>>|numeric|float|real|double|date|time|datetime|timestamp|byte|int|short|l
> >>>ong|string|hllc|_literal_type|_derived_type)\s*(?:[(]([\d\s,]+)[)])?"
> >>>
> >>> So it seems Kylin does not support the avro complex type, is it right?
> >>>Do you have any suggestion how to process the complex data type.
> >>>
> >>> SerDe Library:      org.apache.hadoop.hive.serde2.avro.AvroSerDe
> >>>
> >>>InputFormat: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
> >>>
> >>>OutputFormat:
> org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputForma
> >>>t
> >>>
> >>> Following is my table schema:
> >>> 0           sessionid       string
> >>> 1           userid  string
> >>> 2           hosts   array<string>
> >>> 3           domain  string
> >>> 4           visittimes      int
> >>> 5           firsttimestamp  bigint
> >>> 6           lasttimestamp   bigint
> >>> 7           sessiontimestamp        bigint
> >>> 8           useragent       map<string,string>
> >>> 9           srcaddrunsignedint      bigint
> >>> 10          srcaddrstr      string
> >>> 11          srcaddrcity     map<string,string>
> >>> 12          srcaddrlocation map<string,string>
> >>> 13          destaddrunsignedint     bigint
> >>> 14          destaddrstr     string
> >>> 15          destaddrcity    map<string,string>
> >>> 16          destaddrlocation        map<string,string>
> >>> 17          keywords        map<string,array<string>>
> >>> 18          topics  map<string,double>
> >>> 19          cookies map<string,string>
> >>> 20          urls    array<string>
> >>> 21          year    int
> >>> 22          month   int
> >>> 23          day     int
> >>> 24          hour    int
>
>


-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message