kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "lk_hadoop"<lk_had...@163.com>
Subject Re: Re:What dose Data Size and Source Table Size mean
Date Fri, 30 Aug 2019 02:50:13 GMT
Thank you very much.

2019-08-30 

lk_hadoop 



发件人:maoxiaomao <lang--lang--lang@163.com>
发送时间:2019-08-29 15:49
主题:Re:What dose Data Size and Source Table Size mean
收件人:"user@kylin.apache.org"<user@kylin.apache.org>
抄送:

Hi lk,
   This is my understanding, I'm not quite sure about it. ( my kylin version v2.6.1 )
      1. Data Size : for each segment, each mr step, it is the output data size, also it's
one of the mapreduce counters. which can be seen in log  as "HDFS Write"(for Step#1) or "HDFS:
Number of bytes written"(other MR Steps)
 
      2. Source Table Size : is the size of Source Data read as String in some point, and
in the website it's a sum of each segment. Which is counter of Step#2. Extract Fact Table
Distinct Columns named "BYTES" of class "org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper$RawDataCounter",it
can be seen at the bottom of Step#2.log as follow.   it calculate as follow 

      In FactDistinctColumnsMapper.doMap:

      and for hive the parseMapperInput work as :

      also countSizeInBytes calculate as:
 
     3. Cube Size : the finally HFile Size,"Data Size" of Step.Convert Cuboid Data to HFile,
and in the website it's a sum of each segment.    




--
At 2019-08-28 14:13:00, "lk_hadoop" <lk_hadoop@163.com> wrote:

hi,all:
I am not quite understand some index I saw on the kylin's web:

#1 Step Name: Create Intermediate Flat Hive Table

Data Size: 72.06 GB 

Duration: 6.07 mins Waiting: 0 seconds

what is the "Data Size" mean ?  all the records data size *2 ?



what is the "Source Table Size" mean?


thanks for your attention.

2019-08-28


lk_hadoop 
Mime
View raw message