kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "lk_hadoop"<>
Subject Re: Re:What dose Data Size and Source Table Size mean
Date Fri, 30 Aug 2019 02:50:13 GMT
Thank you very much.



发件人:maoxiaomao <>
发送时间:2019-08-29 15:49
主题:Re:What dose Data Size and Source Table Size mean

Hi lk,
   This is my understanding, I'm not quite sure about it. ( my kylin version v2.6.1 )
      1. Data Size : for each segment, each mr step, it is the output data size, also it's
one of the mapreduce counters. which can be seen in log  as "HDFS Write"(for Step#1) or "HDFS:
Number of bytes written"(other MR Steps)
      2. Source Table Size : is the size of Source Data read as String in some point, and
in the website it's a sum of each segment. Which is counter of Step#2. Extract Fact Table
Distinct Columns named "BYTES" of class "$RawDataCounter",it
can be seen at the bottom of Step#2.log as follow.   it calculate as follow 

      In FactDistinctColumnsMapper.doMap:

      and for hive the parseMapperInput work as :

      also countSizeInBytes calculate as:
     3. Cube Size : the finally HFile Size,"Data Size" of Step.Convert Cuboid Data to HFile,
and in the website it's a sum of each segment.    

At 2019-08-28 14:13:00, "lk_hadoop" <> wrote:

I am not quite understand some index I saw on the kylin's web:

#1 Step Name: Create Intermediate Flat Hive Table

Data Size: 72.06 GB 

Duration: 6.07 mins Waiting: 0 seconds

what is the "Data Size" mean ?  all the records data size *2 ?

what is the "Source Table Size" mean?

thanks for your attention.


View raw message