kylin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ShaoFeng Shi <shaofeng...@apache.org>
Subject Re: Unexpected behavior when joinning streaming table and hive table
Date Thu, 14 Feb 2019 02:46:13 GMT
Using hour as the partition column should be fine. From the data, it seems
the declared column sequence is not matched with the persisted data.

Lifan, I see you posted the cube JSON, could you please also provide the
model's JSON? That would help to analysis the problem. Thank you!

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
Kyligence Inc: https://kyligence.io/

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org




Xiaoxiang Yu <xiaoxiang.yu@kyligence.io> 于2019年2月14日周四 上午10:34写道:

> Hi, lifei
>
> After check your model.json, I found you use "HOUR_START" as your
> partition_date_column, which is not correct.
> I think you should change to "timestamp" and have another try.
>
> Source code at
> https://github.com/apache/kylin/blob/master/source-kafka/src/main/java/org/apache/kylin/source/kafka/TimedJsonStreamParser.java#L111
>
> If you find any mistake, please let me know.
>
> ----------------
> Best wishes,
> Xiaoxiang Yu
>
>
> On [DATE], "[NAME]" <[ADDRESS]> wrote:
>
>     Hello, I am evaluating Kylin and tried to join streaming table and hive
>     table, but now got unexpected behavior.
>
>     All the scripts can be found in
>     https://gist.github.com/OstCollector/a4ac396e3169aa42a416d96db3021195
>     (may need to modify some script to match the environments)
>
>     Environment:
>     Centos 7
>     Hadoop on CDH-5.8
>     dedicated Kafka-2.1 (not included in CDH)
>
>     How to reproduce this problem:
>
>     1. run gen_station.pl to generate dim table data
>     2. run import-data.sh to build dim table in Hive
>     3. run factdata.pl and pipe its output into kafka
>     4. create tables TEST_WEATHER.STATION_INFO(hive)
>     TEST_WEATHER.WEATHER(streaming) in Kylin
>     5. create model and cube in Kylin, join WEATHER.SATION_ID = STATION.ID
>     6. build the cube
>
>     Expected behavior:
>     The cube is built correctly and I can get data when search.
>
>     Actual behavior:
>     On apache-kylin-2.6.0-bin-cdh57: build failed at step #2 (Create
>     Intermediate Flat Hive Table)
>     On apache-kylin-2.5.2-bin-cdh57: got empty cube
>
>     I also tried with this case without streaming, with the format of
> timestamp
>     column changed to "%Y-%m-%d %H:%M:%S", and an additional table to
> store the
>     mapping of timestamp and {hour,day,month,year}_start.
>     In this case, the cube is built as expected.
>
>
>     In both failed cases, the intermediate fact table on Hive built in
> step #2
>     seems to have wrong column order.
>     e.g. on version 2.5.2-cdh57, the schema and content of temp table are
> shown
>     below:
>
>     CREATE EXTERNAL TABLE IF NOT EXISTS
>     kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact
>     (
>     DAY_START date
>     ,YEAR_START date
>     ,STATION_ID string
>     ,QUARTER_START date
>     ,MONTH_START date
>     ,TEMPERATURE bigint
>     ,HOUR_START timestamp
>     )
>     STORED AS SEQUENCEFILE
>     LOCATION
>
> 'hdfs://hz-dev-hdfs-service/user/admin/kylin-2/kylin_metadata/kylin-5dbe40eb-55ba-2245-c0b5-1e9efcb67937/kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact';
>     ALTER TABLE
>     kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact
> SET
>     TBLPROPERTIES('auto.purge'='true');
>
>     hive> select * from
>     kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact
> limit
>     10;
>     OK
>     NULL    2010-01-01      2010-01-01      2010-01-01      2010-01-01
>   NULL
>     NULL
>     NULL    2009-01-01      2009-10-01      2009-12-01      2009-12-31
>   NULL
>     NULL
>     NULL    2009-01-01      2009-10-01      2009-12-01      2009-12-31
>   NULL
>     NULL
>     NULL    2009-01-01      2009-10-01      2009-12-01      2009-12-31
>   NULL
>     NULL
>     NULL    2009-01-01      2009-10-01      2009-12-01      2009-12-31
>   NULL
>     NULL
>     NULL    2010-01-01      2010-01-01      2010-01-01      2010-01-01
>   NULL
>     NULL
>     NULL    2010-01-01      2010-01-01      2010-01-01      2010-01-01
>   NULL
>     NULL
>     NULL    2009-01-01      2009-10-01      2009-12-01      2009-12-31
>   NULL
>     NULL
>     NULL    2009-01-01      2009-10-01      2009-12-01      2009-12-31
>   NULL
>     NULL
>     NULL    2010-01-01      2010-01-01      2010-01-01      2010-01-01
>   NULL
>     NULL
>     Time taken: 0.421 seconds, Fetched: 10 row(s)
>
>     While the the content of temp file is:
>     # hdfs dfs -text
>
> hdfs://hz-dev-hdfs-service/user/admin/kylin-2/kylin_metadata/kylin-5dbe40eb-55ba-2245-c0b5-1e9efcb67937/kylin_intermediate_weather_f32241e6_53c6_2949_b737_d9a88a4618df_fact/part-m-00001
>     | head -n 10
>     19/02/13 11:44:12 INFO zlib.ZlibFactory: Successfully loaded &
> initialized
>     native-zlib library
>     19/02/13 11:44:12 INFO compress.CodecPool: Got brand-new decompressor
>     [.deflate]
>     19/02/13 11:44:12 INFO compress.CodecPool: Got brand-new decompressor
>     [.deflate]
>     19/02/13 11:44:12 INFO compress.CodecPool: Got brand-new decompressor
>     [.deflate]
>     19/02/13 11:44:12 INFO compress.CodecPool: Got brand-new decompressor
>     [.deflate]
>             0030322010-01-012010-01-012010-01-012010-01-012010-01-01
>     07:00:001706
>             0075762010-01-012010-01-012010-01-012010-01-012010-01-01
>     07:00:002605
>             0113882010-01-012010-01-012010-01-012010-01-012010-01-01
>     07:00:002963
>             0214922010-01-012010-01-012010-01-012010-01-012010-01-01
>     07:00:001769
>             0303062010-01-012010-01-012010-01-012010-01-012010-01-01
> 07:00:00432
>             0377712010-01-012010-01-012010-01-012010-01-012010-01-01
> 07:00:00808
>             0443462010-01-012010-01-012010-01-012010-01-012010-01-01
>     07:00:001400
>             0500512010-01-012010-01-012010-01-012010-01-012010-01-01
> 07:00:00342
>             0537982010-01-012010-01-012010-01-012010-01-012010-01-01
>     07:00:001587
>             0597122010-01-012010-01-012010-01-012010-01-012010-01-01
>     07:00:00-1309
>     (the '\x01' character is not correctly copied)
>
>     So what am I doing wrong?
>
>     --
>     Sent from: http://apache-kylin.74782.x6.nabble.com/
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message