kylin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Long" <wayn...@qq.com>
Subject Re: Unexpected behavior when joinning streaming table and hive table
Date Mon, 18 Feb 2019 06:48:28 GMT
Hi lifan,
  I reproduced this problem with sample streaming cube and I created a jira issue for it[https://issues.apache.org/jira/browse/KYLIN-3815].
If you have already fixed it, a pr is welcome.
------------------
Best Regards,
Chao Long


------------------ Original ------------------
From:  "lifan.su"<sulifan@lvwan.com>;
Date:  Wed, Feb 13, 2019 04:30 PM
To:  "dev"<dev@kylin.apache.org>;

Subject:  Unexpected behavior when joinning streaming table and hive table



Hello, I am evaluating Kylin and tried to join streaming table and hive
table, but now got unexpected behavior.

All the scripts can be found in
https://gist.github.com/OstCollector/a4ac396e3169aa42a416d96db3021195
(may need to modify some script to match the environments)

Environment: 
Centos 7
Hadoop on CDH-5.8
dedicated Kafka-2.1 (not included in CDH)

How to reproduce this problem:

1. run gen_station.pl to generate dim table data
2. run import-data.sh to build dim table in Hive
3. run factdata.pl and pipe its output into kafka
4. create tables TEST_WEATHER.STATION_INFO(hive)
TEST_WEATHER.WEATHER(streaming) in Kylin
5. create model and cube in Kylin, join WEATHER.SATION_ID = STATION.ID
6. build the cube

Expected behavior:
The cube is built correctly and I can get data when search.

Actual behavior:
On apache-kylin-2.6.0-bin-cdh57: build failed at step #2 (Create
Intermediate Flat Hive Table)
On apache-kylin-2.5.2-bin-cdh57: got empty cube

I also tried with this case without streaming, with the format of timestamp
column changed to "%Y-%m-%d %H:%M:%S", and an additional table to store the
mapping of timestamp and {hour,day,month,year}_start.
In this case, the cube is built as expected. 


In both failed cases, the intermediate fact table on Hive built in step #2
seems to have wrong column order.
e.g. on version 2.5.2-cdh57, the schema and content of temp table are shown
below:

CREATE EXTERNAL TABLE IF NOT EXISTS
kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact
(
DAY_START date
,YEAR_START date
,STATION_ID string
,QUARTER_START date
,MONTH_START date
,TEMPERATURE bigint
,HOUR_START timestamp
)
STORED AS SEQUENCEFILE
LOCATION
'hdfs://hz-dev-hdfs-service/user/admin/kylin-2/kylin_metadata/kylin-5dbe40eb-55ba-2245-c0b5-1e9efcb67937/kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact';
ALTER TABLE
kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact SET
TBLPROPERTIES('auto.purge'='true');

hive> select * from
kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact limit
10;
OK
NULL    2010-01-01      2010-01-01      2010-01-01      2010-01-01      NULL   
NULL
NULL    2009-01-01      2009-10-01      2009-12-01      2009-12-31      NULL   
NULL
NULL    2009-01-01      2009-10-01      2009-12-01      2009-12-31      NULL   
NULL
NULL    2009-01-01      2009-10-01      2009-12-01      2009-12-31      NULL   
NULL
NULL    2009-01-01      2009-10-01      2009-12-01      2009-12-31      NULL   
NULL
NULL    2010-01-01      2010-01-01      2010-01-01      2010-01-01      NULL   
NULL
NULL    2010-01-01      2010-01-01      2010-01-01      2010-01-01      NULL   
NULL
NULL    2009-01-01      2009-10-01      2009-12-01      2009-12-31      NULL   
NULL
NULL    2009-01-01      2009-10-01      2009-12-01      2009-12-31      NULL   
NULL
NULL    2010-01-01      2010-01-01      2010-01-01      2010-01-01      NULL   
NULL
Time taken: 0.421 seconds, Fetched: 10 row(s)

While the the content of temp file is:
# hdfs dfs -text
hdfs://hz-dev-hdfs-service/user/admin/kylin-2/kylin_metadata/kylin-5dbe40eb-55ba-2245-c0b5-1e9efcb67937/kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact/part-m-00001
| head -n 10
19/02/13 11:44:12 INFO zlib.ZlibFactory: Successfully loaded & initialized
native-zlib library
19/02/13 11:44:12 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]
19/02/13 11:44:12 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]
19/02/13 11:44:12 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]
19/02/13 11:44:12 INFO compress.CodecPool: Got brand-new decompressor
[.deflate]
        0030322010-01-012010-01-012010-01-012010-01-012010-01-01
07:00:001706
        0075762010-01-012010-01-012010-01-012010-01-012010-01-01
07:00:002605
        0113882010-01-012010-01-012010-01-012010-01-012010-01-01
07:00:002963
        0214922010-01-012010-01-012010-01-012010-01-012010-01-01
07:00:001769
        0303062010-01-012010-01-012010-01-012010-01-012010-01-01 07:00:00432
        0377712010-01-012010-01-012010-01-012010-01-012010-01-01 07:00:00808
        0443462010-01-012010-01-012010-01-012010-01-012010-01-01
07:00:001400
        0500512010-01-012010-01-012010-01-012010-01-012010-01-01 07:00:00342
        0537982010-01-012010-01-012010-01-012010-01-012010-01-01
07:00:001587
        0597122010-01-012010-01-012010-01-012010-01-012010-01-01
07:00:00-1309
(the '\x01' character is not correctly copied)

So what am I doing wrong?

--
Sent from: http://apache-kylin.74782.x6.nabble.com/
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message