kylin-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mahong...@apache.org
Subject svn commit: r1730998 - in /kylin/site: blog/2016/02/18/ blog/2016/02/18/new-agg/ blog/2016/02/18/new-agg/index.html blog/index.html docs/index.html feed.xml
Date Thu, 18 Feb 2016 03:57:21 GMT
Author: mahongbin
Date: Thu Feb 18 03:57:21 2016
New Revision: 1730998

URL: http://svn.apache.org/viewvc?rev=1730998&view=rev
Log:
doc agg

Added:
    kylin/site/blog/2016/02/18/
    kylin/site/blog/2016/02/18/new-agg/
    kylin/site/blog/2016/02/18/new-agg/index.html
Modified:
    kylin/site/blog/index.html
    kylin/site/docs/index.html
    kylin/site/feed.xml

Added: kylin/site/blog/2016/02/18/new-agg/index.html
URL: http://svn.apache.org/viewvc/kylin/site/blog/2016/02/18/new-agg/index.html?rev=1730998&view=auto
==============================================================================
--- kylin/site/blog/2016/02/18/new-agg/index.html (added)
+++ kylin/site/blog/2016/02/18/new-agg/index.html Thu Feb 18 03:57:21 2016
@@ -0,0 +1,367 @@
+<!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements.  See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership.  The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License.  You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+<!doctype html>
+<html>
+	<!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements.  See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership.  The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License.  You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+
+<head>
+  <meta charset="utf-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+
+  <title>Apache Kylin | New Aggregation Group</title>
+  <meta name="description" content="Full title of this article: New Aggregation Group Design to Tackle Curse of Dimension Problem (Especially when high cardinality dimensions exist)">
+  <meta name="author"      content="Apache Kylin">
+  <link rel="shortcut icon" href="fav.png" type="image/png">
+
+
+
+<link rel="stylesheet" href="/assets/css/animate.css">
+<!-- Bootstrap -->
+<link rel="stylesheet" href="/assets/css/bootstrap.min.css">
+
+<!-- Fonts -->
+<!-- <link rel="stylesheet" href="http://fonts.googleapis.com/css?family=Alice|Open+Sans:400,300,700"> -->
+
+<!-- Icons -->
+<link rel="stylesheet" href="/assets/css/font-awesome.min.css">
+
+  <!-- Custom styles -->
+  <link rel="stylesheet" href="/assets/css/styles.css">
+  <link rel="stylesheet" href="/assets/css/docs.css">
+  <link rel="stylesheet" href="/assets/css/pygments.css">
+
+  <link rel="canonical" href="http://kylin.apache.org/blog/2016/02/18/new-agg/">
+  <link rel="alternate" type="application/rss+xml" title="Apache Kylin" href="http://kylin.apache.org/feed.xml" />
+
+<!--[if lt IE 9]> <script src="assets/js/html5shiv.js"></script> <![endif]-->
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+  //oringal tracker for kylin.io
+  ga('create', 'UA-55534813-1', 'auto');
+  //new tracker for kylin.apache.org
+  ga('create', 'UA-55534813-2', 'auto', {'name':'toplevel'});
+
+  ga('send', 'pageview');
+  ga('toplevel.send', 'pageview');
+
+
+</script>
+<script type="text/javascript" src="/assets/js/jquery-1.9.1.min.js"></script>
+<script type="text/javascript" src="/assets/js/nside.js"></script> </script>
+<script type="text/javascript" src="/assets/js/nnav.js"></script> </script>
+</head>
+
+	<body>
+		<!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements.  See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership.  The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License.  You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+
+<header id="header" >
+  
+  <div id="head" class="parallax" parallax-speed="3" >
+    <div id="logo" class="text-center"> <img class="img-circle" id="circlelogo" src="/assets/images/kylin_logo.jpg"> <span class="title" >Apache Kylin™</span> <span class="tagline">Extreme OLAP Engine for Big Data</span> 
+    </div>
+  </div>
+  
+
+  <!-- Main Menu -->
+  <nav class="navbar navbar-default" role="navigation" id="nav-wrapper">
+  <div class="container-fluid" id="nav">
+    <!--
+    <img class="img-circle" width="40px" height="40px" id="circlelogo" src="/assets/images/kylin_logo.jpg">
+    -->
+    <!-- Brand and toggle get grouped for better mobile display -->
+    <div class="navbar-header">
+      <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1">
+        <span class="sr-only">Toggle navigation</span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+      </button>
+     
+    </div>
+
+    <!-- Collect the nav links, forms, and other content for toggling -->
+    <div class="collapse navbar-collapse" id="bs-example-navbar-collapse-1">
+      <ul class="nav navbar-nav">
+     <li><a href="/">Home</a></li>
+          <li><a href="/docs" >Docs</a></li>
+          <li><a href="/download">Download</li>
+          <li><a href="/community" >Community</a></li>
+          <li><a href="/development" >Development</a></li>
+          <li><a href="/blog">Blog</li>
+          <li><a href="/cn" >中文版</a></li>  
+          <li><a href="https://twitter.com/apachekylin" target="_blank" class="fa fa-twitter fa-lg" title="Twitter: @ApacheKylin" ></a></li>
+          <li><a href="https://github.com/apache/kylin" target="_blank" class="fa fa-github-alt fa-lg" title="Github: apache/kylin" ></a></li>          
+          <li><a href="https://www.facebook.com/kylinio" target="_blank" class="fa fa-facebook fa-lg" title="Facebook: kylin.io" ></a></li>   
+      </ul>      
+    </div><!-- /.navbar-collapse -->
+  </div><!-- /.container-fluid -->
+</nav>
+ </header>
+
+		<div class="page-content">
+			<header style=" padding:2em 0 0 0">
+			<div class="container" >
+				<h4 class="section-title"><span>Apache Kylin™ Technical Blog</span></h4>
+			</div>
+		</div>
+
+		<div class="container">
+			<div>
+				<article class="post-content" >	
+				<!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements.  See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership.  The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License.  You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+
+<div class="post" style=" padding:2em 4em 4em 4em">
+
+  <header class="post-header">
+    <h1 class="post-title">New Aggregation Group</h1>
+    <p class="post-meta" >Feb 18, 2016 • Hongbin Ma</p>
+  </header>
+
+  <article class="post-content" >
+    <p>Full title of this article: <strong>New Aggregation Group Design to Tackle Curse of Dimension Problem (Especially when high cardinality dimensions exist)</strong></p>
+
+<h2 id="abstract">Abstract</h2>
+
+<p>Curse of dimension is an infamous problem for all of the OLAP engines based on pre-calculation. In versions prior to 2.1, Kylin tried to address the problem by some simple techniques, which relieved the problem to some degree. During our open source practices, we found these techniques lack of systematic design thinking, and incapable of addressing lots of common issues. In Kylin 2.1 we redesigned the aggregation group mechanism to make it better server all kinds of cube design scenarios.</p>
+
+<h2 id="introduction">Introduction</h2>
+
+<p>It is a known fact that Kylin speeds up query performance by pre-calculating cubes, which in term contains different combination of all dimensions, a.k.a. cuboids. The problem is that #cuboids grows exponentially with the #dimension. For example, there’re totally 8 possible cuboids for a cube with 3 dimensions, however there are 16 possible cuboids for a cube with 4 dimensions. Even though Kylin is using scalable computation framework (MapReduce) and scalable storage (HBase) to compute and store the cubes, it is still unacceptable if cube size turns up to be times bigger than the original data source.</p>
+
+<p>The solution is to prune unnecessary dimensions. As we previously discussed in http://kylin.apache.org/docs/howto/howto_optimize_cubes.html, it can be approached by two ways:</p>
+
+<p>First, we can remove dimensions those do NOT necessarily have to be dimensions. For example, imagine a date lookup table where keeps cal_dt is the PK column as well as lots of deriving columns like week_begin_dt,  month_begin_dt.  Even though analysts need week_begin_dt as a dimension, we can prune it as it can always be calculated from dimension cal_dt, this is the “derived” optimization.</p>
+
+<p>Second, some of combinations between dimensions can be pruned. This is the main discuss for this article, and let’s call it “combination pruning”. For example, if a dimension is specified as “mandatory”, then all of the combinations without such dimension can be pruned. If dimension A,B,C forms a “hierarchy” relation, then only combinations with A, AB or ABC shall be remained. Prior to 2.1, Kylin also had an “aggregation group” concept, which also serves for combination pruning. However it is poorly documented and hard to understand (I also found it is difficult to explain). Anyway we’ll skip it as we will re-define what “aggregation group” really is.</p>
+
+<p>During our open source practice we found some significant drawbacks for the original combination pruning techniques. Firstly, these techniques are isolated rather than systematically well designed. Secondly, the original aggregation group is poorly designed and documented that it is hardly used outside eBay. Thirdly, which is the most important one, it’s not expressive enough in terms of describing semantics.</p>
+
+<p>To illustrate the describing semantic issue, let’s imagine a transaction data cube where there is a very high cardinality dimension called buyer_id, as well as other normal dimensions like transaction date cal_dt, buyers’ location city, etc. The analyst might need to get an overview impression by grouping non-buyer_id dimensions, like grouping only cal_dt. The analyst might also need to drill down to check a specific buyer’s behavior by providing a buyer_id filter. Given the fact that buy_id has really high cardinality, once the buyer_id is determined, the related records should be very few (so just use the base cuboid and do some query time aggregation to “aggregate out” the unwanted dimensions is okay). In such cases the expected output of pruning policy should be:</p>
+
+<table>
+  <tbody>
+    <tr>
+      <td>Cuboid</td>
+      <td>Compute or Skip</td>
+      <td>Reason</td>
+    </tr>
+    <tr>
+      <td>———————-</td>
+      <td>—————–</td>
+      <td>————————————————————————————————————————</td>
+    </tr>
+    <tr>
+      <td>city</td>
+      <td>compute</td>
+      <td>Group by location</td>
+    </tr>
+    <tr>
+      <td>cal_dt</td>
+      <td>compute</td>
+      <td>Group by date</td>
+    </tr>
+    <tr>
+      <td>buyer_id</td>
+      <td>skip</td>
+      <td>Group by buyer yield too many results to analyze, buyer_id should be used as a filter and used by visiting base cuboid</td>
+    </tr>
+    <tr>
+      <td>city,cal_dt</td>
+      <td>compute</td>
+      <td>Group by location and date</td>
+    </tr>
+    <tr>
+      <td>city,buyer_id</td>
+      <td>skip</td>
+      <td>Group by buyer yield too many results to analyze, buyer_id should be used as a filter and used by visiting base cuboid</td>
+    </tr>
+    <tr>
+      <td>cal_dt,buyer_id</td>
+      <td>skip</td>
+      <td>Group by buyer yield too many results to analyze, buyer_id should be used as a filter and used by visiting base cuboid</td>
+    </tr>
+    <tr>
+      <td>city,cal_dt,buyer_id</td>
+      <td>compute</td>
+      <td>Base cuboid</td>
+    </tr>
+  </tbody>
+</table>
+
+<hr />
+
+<p>Unfortunately there is no way to express such pruning settings with the existing semantic tools prior to Kylin 2.1</p>
+
+<h2 id="new-aggregation-group-design">New Aggregation Group Design</h2>
+
+<p>In Kylin 2.1 we redesigned the aggregation group mechanism in the jira issue https://issues.apache.org/jira/browse/KYLIN-242. The issue was named “Kylin Cuboid Whitelist” because the new design even enables cube designer to specify expected cuboids by keeping a whitelist, imagine how expressive it can be!</p>
+
+<p>In the new design, aggregation group (abbr. AGG) is defined as a cluster of cuboids that subject to shared rules. Cube designer can define one or more AGG for a cube, and the union of all AGGs’ contributed cuboids consists of the valid combination for a cube. Notice a cuboid is allowed to appear in multiple AGGs, and it will only be computed once during cube building.</p>
+
+<p>If you look into the internal of AGG ( https://github.com/apache/kylin/blob/2.x-staging/core-cube/src/main/java/org/apache/kylin/cube/model/AggregationGroup.java) there’re two important properties defined: <code class="highlighter-rouge">@JsonProperty("includes")</code> and <code class="highlighter-rouge">@JsonProperty("select_rule")</code>.</p>
+
+<p><code class="highlighter-rouge">@JsonProperty("includes")</code><br />
+This property is for specifying which dimensions are included in the AGG. The value of the property must be a subset of the complete dimensions. Keep the proper minimal by including only necessary dimensions.</p>
+
+<p><code class="highlighter-rouge">@JsonProperty("select_rule")</code><br />
+Select rules are the rules that all valid cuboids in the AGG will subject to. Here cube designers can define multiple rules to apply on the included dimensions, currently there’re three types of rule:</p>
+
+<ul>
+  <li>Hierarchy rules, described above</li>
+  <li>Mandatory rule, described above</li>
+  <li>Joint rules. This is a newly introduced rule. If two or more dimensions are “joint”, then any valid cuboid will either contain none of these dimensions, or contain them all. In other words, these dimensions will always be “together”. This is useful when the cube designer is sure some of the dimensions will always be queried together. It is also a nuclear weapon for combination pruning on less-likely-to-use dimensions. Suppose you have 20 dimensions, the first 10 dimensions are frequently used and the latter 10 are less likely to be used. By joining the latter 10 dimensions as “joint”, you’re effectively reducing cuboid numbers from 220 to 211. Actually this is pretty much what the old “aggregation group” mechanism was for. If you’re using it prior Kylin 2.1, our metadata upgrade tool will automatically translate it to joint semantics.</li>
+</ul>
+
+<p>By flexibly using the new aggregation group you can in theory control whatever cuboid to compute/skip. This could significant reduce the computation and storage overhead, especially when the cube is serving for a fixed dashboard, which will reproduce SQL queries that only require some specific cuboids.  In extreme cases you can configure each AGG contain only one cuboid, and a handful of AGGs will consists of the cuboid whitelist that you’ll need.</p>
+
+<p>Kylin’s cuboid computation scheduler will arrange all the valid cuboids’ computation order based on AGG definition. You don’t need to care about how it’s implemented, because every cuboid will just got computed and computed only once. The only thing you need to keep in mind is: don’t abuse AGG. Leverage AGG’s select rules as much as possible, and avoid introducing a lot of “single cuboid AGG” unless it’s really necessary. Too many AGG is a burden for cuboid computation scheduler, as well as the query engine.</p>
+
+<h2 id="buyerid-issue-revisited">Buyer_id issue revisited</h2>
+
+<p>Now that we have got the new AGG tool, the buyer_id issue can be revisited. What we need to do is to define two AGGs for the cube:</p>
+
+<ul>
+  <li><code class="highlighter-rouge">AGG1</code> includes: [cal_dt, city, buyer_id] select_rules:{joint:[cal_dt,city,buyer_id]}</li>
+  <li><code class="highlighter-rouge">AGG2</code> includes: [cal_dt,city] select rules:{}</li>
+</ul>
+
+<p>The first AGG will contribute the base cuboid only, and the second AGG will contribute all the cuboids without buyer_id.</p>
+
+<h2 id="start-using-it">Start using it</h2>
+
+<p>The new aggregation group mechanism should be available in Kylin 2.1. Up to today (2016.2.18) Kylin has not released 2.1 version yet. Use it at your own risk by compiling the latest 2.x-staging code branch.</p>
+
+<p>For legacy users you will need to upgrade your metadata store from Kylin 2.0. Cube rebuild is not required if you’re upgrading from Kylin 2.0.</p>
+
+
+  </article>
+
+</div>
+
+
+
+
+
+				</article>
+			</div>
+		</div>		
+		<!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements.  See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership.  The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License.  You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+
+<footer id="underfooter">
+    <div class="container">
+        <div class="row">
+            <div class="col-md-12 widget">
+                <div class="widget-body" style="text-align:center">
+                    <a href="http://www.apache.org">
+                        <img id="asf-logo" alt="Apache Software Foundation" src="/assets/images/feather-small.gif">
+                    </a>
+
+                    <div>
+                        The contents of this website are © 2015 Apache Software Foundation under the terms of the <a
+                            href="http://www.apache.org/licenses/LICENSE-2.0"> Apache License v2 </a>. Apache Kylin and
+                        its logo are trademarks of the Apache Software Foundation.
+                    </div>
+
+                </div>
+            </div>
+        </div>
+        <!-- /row of widgets -->
+
+    </div>
+    <div></div>
+
+</footer>
+
+	<script src="/assets/js/jquery-1.9.1.min.js"></script> 
+	<script src="/assets/js/bootstrap.min.js"></script> 
+	<script src="/assets/js/main.js"></script>
+	</body>
+</html>
+
+
+
+

Modified: kylin/site/blog/index.html
URL: http://svn.apache.org/viewvc/kylin/site/blog/index.html?rev=1730998&r1=1730997&r2=1730998&view=diff
==============================================================================
--- kylin/site/blog/index.html (original)
+++ kylin/site/blog/index.html Thu Feb 18 03:57:21 2016
@@ -174,6 +174,12 @@
             
             <li>
         <h2 align="left" style="margin:0px">
+          <a class="post-link" href="/blog/2016/02/18/new-agg/">New Aggregation Group</a></h2><div align="left" class="post-meta">posted: Feb 18, 2016</div>
+        
+      </li>
+    
+            <li>
+        <h2 align="left" style="margin:0px">
           <a class="post-link" href="/blog/2016/02/03/streaming-cubing/">Streaming cubing (Prototype)</a></h2><div align="left" class="post-meta">posted: Feb 3, 2016</div>
         
       </li>

Modified: kylin/site/docs/index.html
URL: http://svn.apache.org/viewvc/kylin/site/docs/index.html?rev=1730998&r1=1730997&r2=1730998&view=diff
==============================================================================
--- kylin/site/docs/index.html (original)
+++ kylin/site/docs/index.html Thu Feb 18 03:57:21 2016
@@ -1827,8 +1827,6 @@
 
 <p>Apache Kylin™ is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, original contributed from eBay Inc.</p>
 
-<p>Future documents: <a href="/docs2/">v2.x</a></p>
-
 <h2 id="installation--setup">Installation &amp; Setup</h2>
 
 <p>Please follow installation &amp; tutorial in the navigation panel.</p>

Modified: kylin/site/feed.xml
URL: http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1730998&r1=1730997&r2=1730998&view=diff
==============================================================================
--- kylin/site/feed.xml (original)
+++ kylin/site/feed.xml Thu Feb 18 03:57:21 2016
@@ -19,11 +19,138 @@
     <description>Apache Kylin Home</description>
     <link>http://kylin.apache.org/</link>
     <atom:link href="http://kylin.apache.org/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Wed, 17 Feb 2016 13:42:30 +0000</pubDate>
-    <lastBuildDate>Wed, 17 Feb 2016 13:42:30 +0000</lastBuildDate>
+    <pubDate>Wed, 17 Feb 2016 19:56:15 -0800</pubDate>
+    <lastBuildDate>Wed, 17 Feb 2016 19:56:15 -0800</lastBuildDate>
     <generator>Jekyll v2.5.3</generator>
     
       <item>
+        <title>New Aggregation Group</title>
+        <description>&lt;p&gt;Full title of this article: &lt;strong&gt;New Aggregation Group Design to Tackle Curse of Dimension Problem (Especially when high cardinality dimensions exist)&lt;/strong&gt;&lt;/p&gt;
+
+&lt;h2 id=&quot;abstract&quot;&gt;Abstract&lt;/h2&gt;
+
+&lt;p&gt;Curse of dimension is an infamous problem for all of the OLAP engines based on pre-calculation. In versions prior to 2.1, Kylin tried to address the problem by some simple techniques, which relieved the problem to some degree. During our open source practices, we found these techniques lack of systematic design thinking, and incapable of addressing lots of common issues. In Kylin 2.1 we redesigned the aggregation group mechanism to make it better server all kinds of cube design scenarios.&lt;/p&gt;
+
+&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
+
+&lt;p&gt;It is a known fact that Kylin speeds up query performance by pre-calculating cubes, which in term contains different combination of all dimensions, a.k.a. cuboids. The problem is that #cuboids grows exponentially with the #dimension. For example, there’re totally 8 possible cuboids for a cube with 3 dimensions, however there are 16 possible cuboids for a cube with 4 dimensions. Even though Kylin is using scalable computation framework (MapReduce) and scalable storage (HBase) to compute and store the cubes, it is still unacceptable if cube size turns up to be times bigger than the original data source.&lt;/p&gt;
+
+&lt;p&gt;The solution is to prune unnecessary dimensions. As we previously discussed in http://kylin.apache.org/docs/howto/howto_optimize_cubes.html, it can be approached by two ways:&lt;/p&gt;
+
+&lt;p&gt;First, we can remove dimensions those do NOT necessarily have to be dimensions. For example, imagine a date lookup table where keeps cal_dt is the PK column as well as lots of deriving columns like week_begin_dt,  month_begin_dt.  Even though analysts need week_begin_dt as a dimension, we can prune it as it can always be calculated from dimension cal_dt, this is the “derived” optimization.&lt;/p&gt;
+
+&lt;p&gt;Second, some of combinations between dimensions can be pruned. This is the main discuss for this article, and let’s call it “combination pruning”. For example, if a dimension is specified as “mandatory”, then all of the combinations without such dimension can be pruned. If dimension A,B,C forms a “hierarchy” relation, then only combinations with A, AB or ABC shall be remained. Prior to 2.1, Kylin also had an “aggregation group” concept, which also serves for combination pruning. However it is poorly documented and hard to understand (I also found it is difficult to explain). Anyway we’ll skip it as we will re-define what “aggregation group” really is.&lt;/p&gt;
+
+&lt;p&gt;During our open source practice we found some significant drawbacks for the original combination pruning techniques. Firstly, these techniques are isolated rather than systematically well designed. Secondly, the original aggregation group is poorly designed and documented that it is hardly used outside eBay. Thirdly, which is the most important one, it’s not expressive enough in terms of describing semantics.&lt;/p&gt;
+
+&lt;p&gt;To illustrate the describing semantic issue, let’s imagine a transaction data cube where there is a very high cardinality dimension called buyer_id, as well as other normal dimensions like transaction date cal_dt, buyers’ location city, etc. The analyst might need to get an overview impression by grouping non-buyer_id dimensions, like grouping only cal_dt. The analyst might also need to drill down to check a specific buyer’s behavior by providing a buyer_id filter. Given the fact that buy_id has really high cardinality, once the buyer_id is determined, the related records should be very few (so just use the base cuboid and do some query time aggregation to “aggregate out” the unwanted dimensions is okay). In such cases the expected output of pruning policy should be:&lt;/p&gt;
+
+&lt;table&gt;
+  &lt;tbody&gt;
+    &lt;tr&gt;
+      &lt;td&gt;Cuboid&lt;/td&gt;
+      &lt;td&gt;Compute or Skip&lt;/td&gt;
+      &lt;td&gt;Reason&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td&gt;———————-&lt;/td&gt;
+      &lt;td&gt;—————–&lt;/td&gt;
+      &lt;td&gt;————————————————————————————————————————&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td&gt;city&lt;/td&gt;
+      &lt;td&gt;compute&lt;/td&gt;
+      &lt;td&gt;Group by location&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td&gt;cal_dt&lt;/td&gt;
+      &lt;td&gt;compute&lt;/td&gt;
+      &lt;td&gt;Group by date&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td&gt;buyer_id&lt;/td&gt;
+      &lt;td&gt;skip&lt;/td&gt;
+      &lt;td&gt;Group by buyer yield too many results to analyze, buyer_id should be used as a filter and used by visiting base cuboid&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td&gt;city,cal_dt&lt;/td&gt;
+      &lt;td&gt;compute&lt;/td&gt;
+      &lt;td&gt;Group by location and date&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td&gt;city,buyer_id&lt;/td&gt;
+      &lt;td&gt;skip&lt;/td&gt;
+      &lt;td&gt;Group by buyer yield too many results to analyze, buyer_id should be used as a filter and used by visiting base cuboid&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td&gt;cal_dt,buyer_id&lt;/td&gt;
+      &lt;td&gt;skip&lt;/td&gt;
+      &lt;td&gt;Group by buyer yield too many results to analyze, buyer_id should be used as a filter and used by visiting base cuboid&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td&gt;city,cal_dt,buyer_id&lt;/td&gt;
+      &lt;td&gt;compute&lt;/td&gt;
+      &lt;td&gt;Base cuboid&lt;/td&gt;
+    &lt;/tr&gt;
+  &lt;/tbody&gt;
+&lt;/table&gt;
+
+&lt;hr /&gt;
+
+&lt;p&gt;Unfortunately there is no way to express such pruning settings with the existing semantic tools prior to Kylin 2.1&lt;/p&gt;
+
+&lt;h2 id=&quot;new-aggregation-group-design&quot;&gt;New Aggregation Group Design&lt;/h2&gt;
+
+&lt;p&gt;In Kylin 2.1 we redesigned the aggregation group mechanism in the jira issue https://issues.apache.org/jira/browse/KYLIN-242. The issue was named “Kylin Cuboid Whitelist” because the new design even enables cube designer to specify expected cuboids by keeping a whitelist, imagine how expressive it can be!&lt;/p&gt;
+
+&lt;p&gt;In the new design, aggregation group (abbr. AGG) is defined as a cluster of cuboids that subject to shared rules. Cube designer can define one or more AGG for a cube, and the union of all AGGs’ contributed cuboids consists of the valid combination for a cube. Notice a cuboid is allowed to appear in multiple AGGs, and it will only be computed once during cube building.&lt;/p&gt;
+
+&lt;p&gt;If you look into the internal of AGG ( https://github.com/apache/kylin/blob/2.x-staging/core-cube/src/main/java/org/apache/kylin/cube/model/AggregationGroup.java) there’re two important properties defined: &lt;code class=&quot;highlighter-rouge&quot;&gt;@JsonProperty(&quot;includes&quot;)&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;@JsonProperty(&quot;select_rule&quot;)&lt;/code&gt;.&lt;/p&gt;
+
+&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;@JsonProperty(&quot;includes&quot;)&lt;/code&gt;&lt;br /&gt;
+This property is for specifying which dimensions are included in the AGG. The value of the property must be a subset of the complete dimensions. Keep the proper minimal by including only necessary dimensions.&lt;/p&gt;
+
+&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;@JsonProperty(&quot;select_rule&quot;)&lt;/code&gt;&lt;br /&gt;
+Select rules are the rules that all valid cuboids in the AGG will subject to. Here cube designers can define multiple rules to apply on the included dimensions, currently there’re three types of rule:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Hierarchy rules, described above&lt;/li&gt;
+  &lt;li&gt;Mandatory rule, described above&lt;/li&gt;
+  &lt;li&gt;Joint rules. This is a newly introduced rule. If two or more dimensions are “joint”, then any valid cuboid will either contain none of these dimensions, or contain them all. In other words, these dimensions will always be “together”. This is useful when the cube designer is sure some of the dimensions will always be queried together. It is also a nuclear weapon for combination pruning on less-likely-to-use dimensions. Suppose you have 20 dimensions, the first 10 dimensions are frequently used and the latter 10 are less likely to be used. By joining the latter 10 dimensions as “joint”, you’re effectively reducing cuboid numbers from 220 to 211. Actually this is pretty much what the old “aggregation group” mechanism was for. If you’re using it prior Kylin 2.1, our metadata upgrade tool will automatically translate it to joint semantics.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;By flexibly using the new aggregation group you can in theory control whatever cuboid to compute/skip. This could significant reduce the computation and storage overhead, especially when the cube is serving for a fixed dashboard, which will reproduce SQL queries that only require some specific cuboids.  In extreme cases you can configure each AGG contain only one cuboid, and a handful of AGGs will consists of the cuboid whitelist that you’ll need.&lt;/p&gt;
+
+&lt;p&gt;Kylin’s cuboid computation scheduler will arrange all the valid cuboids’ computation order based on AGG definition. You don’t need to care about how it’s implemented, because every cuboid will just got computed and computed only once. The only thing you need to keep in mind is: don’t abuse AGG. Leverage AGG’s select rules as much as possible, and avoid introducing a lot of “single cuboid AGG” unless it’s really necessary. Too many AGG is a burden for cuboid computation scheduler, as well as the query engine.&lt;/p&gt;
+
+&lt;h2 id=&quot;buyerid-issue-revisited&quot;&gt;Buyer_id issue revisited&lt;/h2&gt;
+
+&lt;p&gt;Now that we have got the new AGG tool, the buyer_id issue can be revisited. What we need to do is to define two AGGs for the cube:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;AGG1&lt;/code&gt; includes: [cal_dt, city, buyer_id] select_rules:{joint:[cal_dt,city,buyer_id]}&lt;/li&gt;
+  &lt;li&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;AGG2&lt;/code&gt; includes: [cal_dt,city] select rules:{}&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;The first AGG will contribute the base cuboid only, and the second AGG will contribute all the cuboids without buyer_id.&lt;/p&gt;
+
+&lt;h2 id=&quot;start-using-it&quot;&gt;Start using it&lt;/h2&gt;
+
+&lt;p&gt;The new aggregation group mechanism should be available in Kylin 2.1. Up to today (2016.2.18) Kylin has not released 2.1 version yet. Use it at your own risk by compiling the latest 2.x-staging code branch.&lt;/p&gt;
+
+&lt;p&gt;For legacy users you will need to upgrade your metadata store from Kylin 2.0. Cube rebuild is not required if you’re upgrading from Kylin 2.0.&lt;/p&gt;
+
+</description>
+        <pubDate>Thu, 18 Feb 2016 08:30:00 -0800</pubDate>
+        <link>http://kylin.apache.org/blog/2016/02/18/new-agg/</link>
+        <guid isPermaLink="true">http://kylin.apache.org/blog/2016/02/18/new-agg/</guid>
+        
+        
+        <category>blog</category>
+        
+      </item>
+    
+      <item>
         <title>Streaming cubing (Prototype)</title>
         <description>&lt;p&gt;One of the most important features in 2.x branches is streaming cubing which enables OLAP analysis on streaming data. Streaming cubing delivers faster insights on the data to help more promptly business decisions. Even though there are already many real time analysis tools in open source community, Kylin Streaming cubing still differs from them in multiple angles:&lt;/p&gt;
 
@@ -46,7 +173,7 @@
 &lt;p&gt;We’ll publish more detailed documents on how to use Kylin Streaming soon. In latest 2.x branch we are also working on more complicated load balancing schemes for streaming cubing. Please stay tuned.&lt;/p&gt;
 
 </description>
-        <pubDate>Wed, 03 Feb 2016 16:30:00 +0000</pubDate>
+        <pubDate>Wed, 03 Feb 2016 08:30:00 -0800</pubDate>
         <link>http://kylin.apache.org/blog/2016/02/03/streaming-cubing/</link>
         <guid isPermaLink="true">http://kylin.apache.org/blog/2016/02/03/streaming-cubing/</guid>
         
@@ -78,7 +205,7 @@ With sub-seconds query latency feature o
 
 &lt;p&gt;Enjoy!&lt;/p&gt;
 </description>
-        <pubDate>Fri, 25 Dec 2015 23:23:00 +0000</pubDate>
+        <pubDate>Fri, 25 Dec 2015 15:23:00 -0800</pubDate>
         <link>http://kylin.apache.org/blog/2015/12/25/support-powerbi-tableau9/</link>
         <guid isPermaLink="true">http://kylin.apache.org/blog/2015/12/25/support-powerbi-tableau9/</guid>
         
@@ -110,7 +237,7 @@ With sub-seconds query latency feature o
 &lt;p&gt;Enjoy!&lt;/p&gt;
 
 </description>
-        <pubDate>Fri, 25 Dec 2015 23:23:00 +0000</pubDate>
+        <pubDate>Fri, 25 Dec 2015 15:23:00 -0800</pubDate>
         <link>http://kylin.apache.org/cn/blog/2015/12/25/support-powerbi-tableau9/</link>
         <guid isPermaLink="true">http://kylin.apache.org/cn/blog/2015/12/25/support-powerbi-tableau9/</guid>
         
@@ -120,67 +247,6 @@ With sub-seconds query latency feature o
       </item>
     
       <item>
-        <title>Apache Kylin v1.2 正式发布</title>
-        <description>&lt;p&gt;Apache Kylin社区非常高兴宣布Apache Kylin v1.2正式发布,这是自顺利毕业成Apache顶级项目后的第一个发布版本。&lt;/p&gt;
-
-&lt;p&gt;Apache Kylin一个开源的分布式分析引擎,提供Hadoop之上的SQL查询接口及多维分析(OLAP)能力以支持超大规模数据,最初由eBay Inc. 开发并贡献至开源社区。&lt;/p&gt;
-
-&lt;p&gt;下载Apache Kylin v1.2源代码及二进制安装包, &lt;br /&gt;
-请访问&lt;a href=&quot;http://kylin.apache.org/cn/download/&quot;&gt;下载&lt;/a&gt;页面.&lt;/p&gt;
-
-&lt;p&gt;这是一个主要的版本发布带来了更稳定,健壮及更好管理的版本,Apache Kylin社区解决了44个issue,包括Bug修复,功能增强及一些新特性等。&lt;/p&gt;
-
-&lt;h2 id=&quot;section&quot;&gt;主要变化&lt;/h2&gt;
-
-&lt;p&gt;&lt;strong&gt;Kylin 核心功能增强&lt;/strong&gt;&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;支持Excel, Power BI 及 Tableau 9.1 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-596&quot;&gt;KYLIN-596&lt;/a&gt;,&lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1065&quot;&gt;KYLIN-1065&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;增强HDFS小文件处理机制 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-702&quot;&gt;KYLIN-702&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;环境检查脚本中对Hive HCatalog的增强 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1081&quot;&gt;KYLIN-1081&lt;/a&gt;, &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1119&quot;&gt;KYLIN-1119&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;维度列字典编码支持超过千万以上基数 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1099&quot;&gt;KYLIN-1099&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Job页面加载性能改进 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1154&quot;&gt;KYLIN-1154&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;基于每个查询分配内存预算 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1190&quot;&gt;KYLIN-1190&lt;/a&gt;&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;&lt;strong&gt;主要Bug修复&lt;/strong&gt;&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;修复在编辑模式中保存Cube的Bug &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1168&quot;&gt;KYLIN-1168&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Cube创建后不能重命名 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-693&quot;&gt;KYLIN-693&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;项目页面中Cube列表消失 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-930&quot;&gt;KYLIN-930&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Join两个字查询时报错 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1033&quot;&gt;KYLIN-1033&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;当过滤条件是 (A or false) 时导致错误结果 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1039&quot;&gt;KYLIN-1039&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;支持通过ResourceManager HA环境中获取MapReduce任务状态 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1067&quot;&gt;KYLIN-1067&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Build Base Cuboid Data出错后无法发送邮件 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1106&quot;&gt;KYLIN-1106&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;二进制包中ResourceTool 下载/上传不工作 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1121&quot;&gt;KYLIN-1121&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Kylin示例Cube “kylin_sales_cube”无法被保存 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1140&quot;&gt;KYLIN-1140&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;1.x 分支中使用Minicluster的单元测试不工作 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1155&quot;&gt;KYLIN-1155&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;在查询中无法解析如’YYYYMMDD’的日期格式 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1216&quot;&gt;KYLIN-1216&lt;/a&gt;&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;&lt;strong&gt;升级&lt;/strong&gt;  &lt;br /&gt;
-我们建议从早前颁布升级到此版本已获得更好的性能,稳定性及Bug修复等。&lt;br /&gt;
-并且与社区最新特性及支持保持同步。&lt;/p&gt;
-
-&lt;p&gt;&lt;strong&gt;支持&lt;/strong&gt;  &lt;br /&gt;
-升级和使用过程中有任何问题,请: &lt;br /&gt;
-提交至Kylin的JIRA: &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN/&quot;&gt;https://issues.apache.org/jira/browse/KYLIN/&lt;/a&gt;  &lt;br /&gt;
-或者  &lt;br /&gt;
-发送邮件到Apache Kylin邮件列表: &lt;a href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&lt;/a&gt;&lt;/p&gt;
-
-&lt;p&gt;&lt;em&gt;感谢每一位朋友的参与和贡献!&lt;/em&gt;&lt;/p&gt;
-</description>
-        <pubDate>Wed, 23 Dec 2015 22:28:00 +0000</pubDate>
-        <link>http://kylin.apache.org/cn/blog/2015/12/23/release-v1.2/</link>
-        <guid isPermaLink="true">http://kylin.apache.org/cn/blog/2015/12/23/release-v1.2/</guid>
-        
-        
-        <category>blog</category>
-        
-      </item>
-    
-      <item>
         <title>Apache Kylin v1.2 Release Announcement</title>
         <description>&lt;p&gt;The Apache Kylin community is pleased to announce the release of Apache Kylin v1.2, the first release after graduation.&lt;/p&gt;
 
@@ -232,7 +298,7 @@ send mail to Apache Kylin dev mailing li
 
 &lt;p&gt;&lt;em&gt;Great thanks to everyone who contributed!&lt;/em&gt;&lt;/p&gt;
 </description>
-        <pubDate>Wed, 23 Dec 2015 22:28:00 +0000</pubDate>
+        <pubDate>Wed, 23 Dec 2015 14:28:00 -0800</pubDate>
         <link>http://kylin.apache.org/blog/2015/12/23/release-v1.2/</link>
         <guid isPermaLink="true">http://kylin.apache.org/blog/2015/12/23/release-v1.2/</guid>
         
@@ -242,59 +308,60 @@ send mail to Apache Kylin dev mailing li
       </item>
     
       <item>
-        <title>Apache Kylin v1.1 (incubating) Release Announcement</title>
-        <description>&lt;p&gt;The Apache Kylin community is pleased to announce the release of Apache Kylin v1.1 (incubating).&lt;/p&gt;
+        <title>Apache Kylin v1.2 正式发布</title>
+        <description>&lt;p&gt;Apache Kylin社区非常高兴宣布Apache Kylin v1.2正式发布,这是自顺利毕业成Apache顶级项目后的第一个发布版本。&lt;/p&gt;
 
-&lt;p&gt;Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, original contributed from eBay Inc.&lt;/p&gt;
+&lt;p&gt;Apache Kylin一个开源的分布式分析引擎,提供Hadoop之上的SQL查询接口及多维分析(OLAP)能力以支持超大规模数据,最初由eBay Inc. 开发并贡献至开源社区。&lt;/p&gt;
 
-&lt;p&gt;To download Apache Kylin v1.1 (incubating) source code or binary package: &lt;br /&gt;
-please visit the &lt;a href=&quot;http://kylin.apache.org/download&quot;&gt;download&lt;/a&gt; page.&lt;/p&gt;
+&lt;p&gt;下载Apache Kylin v1.2源代码及二进制安装包, &lt;br /&gt;
+请访问&lt;a href=&quot;http://kylin.apache.org/cn/download/&quot;&gt;下载&lt;/a&gt;页面.&lt;/p&gt;
 
-&lt;p&gt;This is a major release which brings more stable, robust and well management version, Apache Kylin community resolved about 56 issues including bug fixes, improvements, and few new features.&lt;/p&gt;
+&lt;p&gt;这是一个主要的版本发布带来了更稳定,健壮及更好管理的版本,Apache Kylin社区解决了44个issue,包括Bug修复,功能增强及一些新特性等。&lt;/p&gt;
 
-&lt;h2 id=&quot;change-highlights&quot;&gt;Change Highlights&lt;/h2&gt;
+&lt;h2 id=&quot;section&quot;&gt;主要变化&lt;/h2&gt;
 
-&lt;p&gt;&lt;strong&gt;Kylin Core Improvement&lt;/strong&gt;&lt;/p&gt;
+&lt;p&gt;&lt;strong&gt;Kylin 核心功能增强&lt;/strong&gt;&lt;/p&gt;
 
 &lt;ul&gt;
-  &lt;li&gt;Support data retention by cube &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-906&quot;&gt;KYLIN-906&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Upgraded Apache Calcite to 1.4 for more bug fixes and SQL functions &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1047&quot;&gt;KYLIN-1047&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Cleanup intermediate Hive data after cube build &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-589&quot;&gt;KYLIN-589&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Continue cube job when Hive return empty resultset &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-772&quot;&gt;KYLIN-772&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Support setting for HBase compression with Snappy or GZip &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-956&quot;&gt;KYLIN-956&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Support load data to separated HBase cluster &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-957&quot;&gt;KYLIN-957&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Introduced Roaring bitmaps for InvertedIndex, contributed by Daniel Lemire &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1034&quot;&gt;KYLIN-1034&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;支持Excel, Power BI 及 Tableau 9.1 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-596&quot;&gt;KYLIN-596&lt;/a&gt;,&lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1065&quot;&gt;KYLIN-1065&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;增强HDFS小文件处理机制 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-702&quot;&gt;KYLIN-702&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;环境检查脚本中对Hive HCatalog的增强 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1081&quot;&gt;KYLIN-1081&lt;/a&gt;, &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1119&quot;&gt;KYLIN-1119&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;维度列字典编码支持超过千万以上基数 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1099&quot;&gt;KYLIN-1099&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Job页面加载性能改进 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1154&quot;&gt;KYLIN-1154&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;基于每个查询分配内存预算 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1190&quot;&gt;KYLIN-1190&lt;/a&gt;&lt;/li&gt;
 &lt;/ul&gt;
 
-&lt;p&gt;&lt;strong&gt;Main Bug Fixes&lt;/strong&gt;&lt;/p&gt;
+&lt;p&gt;&lt;strong&gt;主要Bug修复&lt;/strong&gt;&lt;/p&gt;
 
 &lt;ul&gt;
-  &lt;li&gt;Slowness with many IN() values &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-740&quot;&gt;KYLIN-740&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Web UI “Jobs” issue &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-950&quot;&gt;KYLIN-950&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Query cache is not evicted when metadata changed &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-771&quot;&gt;KYLIN-771&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Select * from fact not work &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-847&quot;&gt;KYLIN-847&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Float can’t be cast to Double when execute SQL &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-918&quot;&gt;KYLIN-918&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Update cube data model may fail and leave metadata in inconsistent state &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-958&quot;&gt;KYLIN-958&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;SQL keyword “offset” bug &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-983&quot;&gt;KYLIN-983&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;AVG not work &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-985&quot;&gt;KYLIN-985&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Dictionary with ‘’ value cause cube merge fail &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1004&quot;&gt;KYLIN-1004&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;修复在编辑模式中保存Cube的Bug &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1168&quot;&gt;KYLIN-1168&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Cube创建后不能重命名 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-693&quot;&gt;KYLIN-693&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;项目页面中Cube列表消失 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-930&quot;&gt;KYLIN-930&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Join两个字查询时报错 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1033&quot;&gt;KYLIN-1033&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;当过滤条件是 (A or false) 时导致错误结果 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1039&quot;&gt;KYLIN-1039&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;支持通过ResourceManager HA环境中获取MapReduce任务状态 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1067&quot;&gt;KYLIN-1067&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Build Base Cuboid Data出错后无法发送邮件 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1106&quot;&gt;KYLIN-1106&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;二进制包中ResourceTool 下载/上传不工作 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1121&quot;&gt;KYLIN-1121&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Kylin示例Cube “kylin_sales_cube”无法被保存 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1140&quot;&gt;KYLIN-1140&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;1.x 分支中使用Minicluster的单元测试不工作 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1155&quot;&gt;KYLIN-1155&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;在查询中无法解析如’YYYYMMDD’的日期格式 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1216&quot;&gt;KYLIN-1216&lt;/a&gt;&lt;/li&gt;
 &lt;/ul&gt;
 
-&lt;p&gt;&lt;strong&gt;Upgrade&lt;/strong&gt;  &lt;br /&gt;
-We recommend to upgrade to this version from v0.7.x and v1.0 for better performance, stability and bug fixes.&lt;br /&gt;
-Also to keep up to date with community with latest features and supports.&lt;/p&gt;
+&lt;p&gt;&lt;strong&gt;升级&lt;/strong&gt;  &lt;br /&gt;
+我们建议从早前颁布升级到此版本已获得更好的性能,稳定性及Bug修复等。&lt;br /&gt;
+并且与社区最新特性及支持保持同步。&lt;/p&gt;
 
-&lt;p&gt;&lt;strong&gt;Support&lt;/strong&gt;  &lt;br /&gt;
-Any issue or question during upgrade, please &lt;br /&gt;
-open JIRA to Kylin project: &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN/&quot;&gt;https://issues.apache.org/jira/browse/KYLIN/&lt;/a&gt;  &lt;br /&gt;
-or  &lt;br /&gt;
-send mail to Apache Kylin dev mailing list: &lt;a href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&lt;/a&gt;&lt;/p&gt;
+&lt;p&gt;&lt;strong&gt;支持&lt;/strong&gt;  &lt;br /&gt;
+升级和使用过程中有任何问题,请: &lt;br /&gt;
+提交至Kylin的JIRA: &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN/&quot;&gt;https://issues.apache.org/jira/browse/KYLIN/&lt;/a&gt;  &lt;br /&gt;
+或者  &lt;br /&gt;
+发送邮件到Apache Kylin邮件列表: &lt;a href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&lt;/a&gt;&lt;/p&gt;
 
-&lt;p&gt;&lt;em&gt;Great thanks to everyone who contributed!&lt;/em&gt;&lt;/p&gt;
+&lt;p&gt;&lt;em&gt;感谢每一位朋友的参与和贡献!&lt;/em&gt;&lt;/p&gt;
 </description>
-        <pubDate>Sun, 25 Oct 2015 17:28:00 +0000</pubDate>
-        <link>http://kylin.apache.org/blog/2015/10/25/release-v1.1-incubating/</link>
-        <guid isPermaLink="true">http://kylin.apache.org/blog/2015/10/25/release-v1.1-incubating/</guid>
+        <pubDate>Wed, 23 Dec 2015 14:28:00 -0800</pubDate>
+        <link>http://kylin.apache.org/cn/blog/2015/12/23/release-v1.2/</link>
+        <guid isPermaLink="true">http://kylin.apache.org/cn/blog/2015/12/23/release-v1.2/</guid>
         
         
         <category>blog</category>
@@ -352,7 +419,7 @@ send mail to Apache Kylin dev mailing li
 
 &lt;p&gt;&lt;em&gt;感谢各位的贡献!&lt;/em&gt;&lt;/p&gt;
 </description>
-        <pubDate>Sun, 25 Oct 2015 17:28:00 +0000</pubDate>
+        <pubDate>Sun, 25 Oct 2015 10:28:00 -0700</pubDate>
         <link>http://kylin.apache.org/cn/blog/2015/10/25/release-v1.1-incubating/</link>
         <guid isPermaLink="true">http://kylin.apache.org/cn/blog/2015/10/25/release-v1.1-incubating/</guid>
         
@@ -362,6 +429,66 @@ send mail to Apache Kylin dev mailing li
       </item>
     
       <item>
+        <title>Apache Kylin v1.1 (incubating) Release Announcement</title>
+        <description>&lt;p&gt;The Apache Kylin community is pleased to announce the release of Apache Kylin v1.1 (incubating).&lt;/p&gt;
+
+&lt;p&gt;Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, original contributed from eBay Inc.&lt;/p&gt;
+
+&lt;p&gt;To download Apache Kylin v1.1 (incubating) source code or binary package: &lt;br /&gt;
+please visit the &lt;a href=&quot;http://kylin.apache.org/download&quot;&gt;download&lt;/a&gt; page.&lt;/p&gt;
+
+&lt;p&gt;This is a major release which brings more stable, robust and well management version, Apache Kylin community resolved about 56 issues including bug fixes, improvements, and few new features.&lt;/p&gt;
+
+&lt;h2 id=&quot;change-highlights&quot;&gt;Change Highlights&lt;/h2&gt;
+
+&lt;p&gt;&lt;strong&gt;Kylin Core Improvement&lt;/strong&gt;&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Support data retention by cube &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-906&quot;&gt;KYLIN-906&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Upgraded Apache Calcite to 1.4 for more bug fixes and SQL functions &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1047&quot;&gt;KYLIN-1047&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Cleanup intermediate Hive data after cube build &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-589&quot;&gt;KYLIN-589&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Continue cube job when Hive return empty resultset &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-772&quot;&gt;KYLIN-772&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Support setting for HBase compression with Snappy or GZip &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-956&quot;&gt;KYLIN-956&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Support load data to separated HBase cluster &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-957&quot;&gt;KYLIN-957&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Introduced Roaring bitmaps for InvertedIndex, contributed by Daniel Lemire &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1034&quot;&gt;KYLIN-1034&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;strong&gt;Main Bug Fixes&lt;/strong&gt;&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Slowness with many IN() values &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-740&quot;&gt;KYLIN-740&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Web UI “Jobs” issue &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-950&quot;&gt;KYLIN-950&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Query cache is not evicted when metadata changed &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-771&quot;&gt;KYLIN-771&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Select * from fact not work &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-847&quot;&gt;KYLIN-847&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Float can’t be cast to Double when execute SQL &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-918&quot;&gt;KYLIN-918&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Update cube data model may fail and leave metadata in inconsistent state &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-958&quot;&gt;KYLIN-958&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;SQL keyword “offset” bug &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-983&quot;&gt;KYLIN-983&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;AVG not work &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-985&quot;&gt;KYLIN-985&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Dictionary with ‘’ value cause cube merge fail &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1004&quot;&gt;KYLIN-1004&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;strong&gt;Upgrade&lt;/strong&gt;  &lt;br /&gt;
+We recommend to upgrade to this version from v0.7.x and v1.0 for better performance, stability and bug fixes.&lt;br /&gt;
+Also to keep up to date with community with latest features and supports.&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;Support&lt;/strong&gt;  &lt;br /&gt;
+Any issue or question during upgrade, please &lt;br /&gt;
+open JIRA to Kylin project: &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN/&quot;&gt;https://issues.apache.org/jira/browse/KYLIN/&lt;/a&gt;  &lt;br /&gt;
+or  &lt;br /&gt;
+send mail to Apache Kylin dev mailing list: &lt;a href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&lt;/a&gt;&lt;/p&gt;
+
+&lt;p&gt;&lt;em&gt;Great thanks to everyone who contributed!&lt;/em&gt;&lt;/p&gt;
+</description>
+        <pubDate>Sun, 25 Oct 2015 10:28:00 -0700</pubDate>
+        <link>http://kylin.apache.org/blog/2015/10/25/release-v1.1-incubating/</link>
+        <guid isPermaLink="true">http://kylin.apache.org/blog/2015/10/25/release-v1.1-incubating/</guid>
+        
+        
+        <category>blog</category>
+        
+      </item>
+    
+      <item>
         <title>Apache Kylin Meetup @Shanghai Oct 10, 2015</title>
         <description>&lt;p class=&quot;center&quot;&gt;&lt;img src=&quot;/images/blog/meetup_1.jpeg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
 
@@ -408,7 +535,7 @@ send mail to Apache Kylin dev mailing li
 
 &lt;p class=&quot;center&quot;&gt;&lt;img src=&quot;/images/blog/meetup_9.jpeg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
 </description>
-        <pubDate>Wed, 14 Oct 2015 17:00:00 +0000</pubDate>
+        <pubDate>Wed, 14 Oct 2015 10:00:00 -0700</pubDate>
         <link>http://kylin.apache.org/blog/2015/10/14/Apache-Kylin-Meetup/</link>
         <guid isPermaLink="true">http://kylin.apache.org/blog/2015/10/14/Apache-Kylin-Meetup/</guid>
         
@@ -549,141 +676,12 @@ No; the purpose of hybrid is to consolid
 &lt;p&gt;&lt;strong&gt;Question 7&lt;/strong&gt;: If a child cube is disabled, will it be scanned via the hybrid?&lt;br /&gt;
 No; hybrid instance will check the child realization’s status before sending query to it; so if the cube is disabled, it will not be scanned.&lt;/p&gt;
 </description>
-        <pubDate>Fri, 25 Sep 2015 16:00:00 +0000</pubDate>
+        <pubDate>Fri, 25 Sep 2015 09:00:00 -0700</pubDate>
         <link>http://kylin.apache.org/blog/2015/09/25/hybrid-model/</link>
         <guid isPermaLink="true">http://kylin.apache.org/blog/2015/09/25/hybrid-model/</guid>
         
         
         <category>blog</category>
-        
-      </item>
-    
-      <item>
-        <title>Fast Cubing on Spark in Apache Kylin</title>
-        <description>&lt;h2 id=&quot;preparation&quot;&gt;Preparation&lt;/h2&gt;
-
-&lt;p&gt;In order to make POC phase as simple as possible, a standalone spark cluster is the best choice.&lt;br /&gt;
-So the environment setup is as below:&lt;/p&gt;
-
-&lt;ol&gt;
-  &lt;li&gt;
-    &lt;p&gt;hadoop sandbox (hortonworks hdp 2.2.0)&lt;/p&gt;
-
-    &lt;p&gt;(8 cores, 16G) * 1&lt;/p&gt;
-  &lt;/li&gt;
-  &lt;li&gt;
-    &lt;p&gt;spark (1.4.1)&lt;/p&gt;
-
-    &lt;p&gt;master:(4 cores, 8G)&lt;/p&gt;
-
-    &lt;p&gt;worker:(4 cores, 8G) * 2&lt;/p&gt;
-  &lt;/li&gt;
-&lt;/ol&gt;
-
-&lt;p&gt;The hadoop conf should also be in the SPARK_HOME/conf&lt;/p&gt;
-
-&lt;h2 id=&quot;fast-cubing-implementation-on-spark&quot;&gt;Fast Cubing Implementation on Spark&lt;/h2&gt;
-
-&lt;p&gt;Spark as a computation framework has provided much richer operators than map-reduce. And some of them are quite suitable for the cubing algorithm, for instance &lt;strong&gt;aggregate&lt;/strong&gt;.&lt;/p&gt;
-
-&lt;p&gt;As the &lt;a href=&quot;http://kylin.apache.org/blog/2015/08/15/fast-cubing/&quot; title=&quot;Fast Cubing Algorithm in Apache Kylin&quot;&gt;Fast cubing algorithm&lt;/a&gt;, it contains several steps:&lt;/p&gt;
-
-&lt;ol&gt;
-  &lt;li&gt;build dictionary&lt;/li&gt;
-  &lt;li&gt;calculate region split for hbase&lt;/li&gt;
-  &lt;li&gt;build &amp;amp; output cuboid data&lt;/li&gt;
-&lt;/ol&gt;
-
-&lt;hr /&gt;
-
-&lt;p&gt;&lt;strong&gt;build dictionary&lt;/strong&gt;&lt;/p&gt;
-
-&lt;p&gt;In order to build dictionary, distinct values of the column are needed, which new API &lt;strong&gt;&lt;em&gt;DataFrame&lt;/em&gt;&lt;/strong&gt; has already provided(since spark 1.3.0).&lt;/p&gt;
-
-&lt;p&gt;So after got the data from the hive through SparkSQL, it is quite natural to directly use the api to build dictionary.&lt;/p&gt;
-
-&lt;hr /&gt;
-
-&lt;p&gt;&lt;strong&gt;calculate region split&lt;/strong&gt;&lt;/p&gt;
-
-&lt;p&gt;In order to calculate the distribution of all cuboids, Kylin use a HyperLogLog implementation. And each record will have a counter, whose size is by default 16KB each. So if the counter shuffles across the cluster, that will be very expensive.&lt;/p&gt;
-
-&lt;p&gt;Spark has provided an operator &lt;strong&gt;&lt;em&gt;aggregate&lt;/em&gt;&lt;/strong&gt; to reduce shuffle size. It first does a map-reduce phase locally, and then another round of reduce to merge the data from each node.&lt;/p&gt;
-
-&lt;hr /&gt;
-
-&lt;p&gt;&lt;strong&gt;build &amp;amp; output cuboid data&lt;/strong&gt;&lt;/p&gt;
-
-&lt;p&gt;In order to build cube, Kylin requires a small batch which can fit into memory in the same time.&lt;/p&gt;
-
-&lt;p&gt;Previously in map-reduce implementation, Kylin leverage the life-cycle callback &lt;strong&gt;cleanup&lt;/strong&gt; to gather all the input together as a batch. This cannot be directly applied in the map reduce operator in spark which we don’t have such life-cycle callback.&lt;/p&gt;
-
-&lt;p&gt;However spark has provided an operator &lt;strong&gt;&lt;em&gt;glom&lt;/em&gt;&lt;/strong&gt; which coalescing all elements within each partition into an array which is exactly Kylin want to build a small batch.&lt;/p&gt;
-
-&lt;p&gt;Once the batch data is ready, we can just apply the Fast Cubing algorithm.&lt;/p&gt;
-
-&lt;p&gt;Then spark api &lt;strong&gt;&lt;em&gt;saveAsNewAPIHadoopFile&lt;/em&gt;&lt;/strong&gt; allow us to write hfile to hdfs and bulk load to HBase.&lt;/p&gt;
-
-&lt;h2 id=&quot;statistics&quot;&gt;Statistics&lt;/h2&gt;
-
-&lt;p&gt;We use the sample data Kylin provided to build cube, total record count is 10000.&lt;/p&gt;
-
-&lt;p&gt;Below are results(system environments are mentioned above)&lt;/p&gt;
-&lt;table&gt;
-    &lt;tr&gt;
-        &lt;td&gt;&lt;/td&gt;
-        &lt;td&gt;Spark&lt;/td&gt;
-        &lt;td&gt;MR&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-        &lt;td&gt;Duration&lt;/td&gt;
-        &lt;td&gt;5.5 min&lt;/td&gt;
-        &lt;td&gt;10+ min&lt;/td&gt;
-    &lt;/tr&gt;
-&lt;/table&gt;
-
-&lt;h2 id=&quot;issues&quot;&gt;Issues&lt;/h2&gt;
-
-&lt;p&gt;Since hdp 2.2+ requires Hive 0.14.0 while spark 1.3.0 only supports Hive 0.13.0. There are several compatibility problems in hive-site.xml we need to fix.&lt;/p&gt;
-
-&lt;ol&gt;
-  &lt;li&gt;
-    &lt;p&gt;some time-related settings&lt;/p&gt;
-
-    &lt;p&gt;There are several settings, whose default value in hive 0.14.0 cannot be parsed in 0.13.0. Such as &lt;strong&gt;hive.metastore.client.connect.retry.delay&lt;/strong&gt;, its default value is &lt;strong&gt;5s&lt;/strong&gt;. And in hive 0.13.0, this value can only be in the format of Long value. So you have to manually change to from &lt;strong&gt;5s&lt;/strong&gt; to &lt;strong&gt;5&lt;/strong&gt;.&lt;/p&gt;
-  &lt;/li&gt;
-  &lt;li&gt;
-    &lt;p&gt;hive.security.authorization.manager&lt;/p&gt;
-
-    &lt;p&gt;If you have enabled this configuration, its default value is &lt;strong&gt;org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory&lt;/strong&gt; which is newly introduced in hive 0.14.0, it means you have to use the another implementation, such as &lt;strong&gt;org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider&lt;/strong&gt;&lt;/p&gt;
-  &lt;/li&gt;
-  &lt;li&gt;
-    &lt;p&gt;hive.execution.engine&lt;/p&gt;
-
-    &lt;p&gt;In hive 0.14.0, the default value of &lt;strong&gt;hive.execution.engine&lt;/strong&gt; is &lt;strong&gt;tez&lt;/strong&gt;, change it to &lt;strong&gt;mr&lt;/strong&gt; in the Spark classpath, otherwise there will be NoClassDefFoundError.&lt;/p&gt;
-  &lt;/li&gt;
-&lt;/ol&gt;
-
-&lt;p&gt;NOTE: Spark 1.4.0 has a &lt;a href=&quot;https://issues.apache.org/jira/browse/SPARK-8368&quot;&gt;bug&lt;/a&gt; which will lead to ClassNotFoundException. And it has been fixed in Spark 1.4.1. So if you are planning to run on Spark 1.4.0, you may need to upgrade to 1.4.1&lt;/p&gt;
-
-&lt;p&gt;Last but not least, when you trying to run Spark application on YARN, make sure that you have hive-site.xml and hbase-site.xml in the  HADDOP_CONF_DIR or YARN_CONF_DIR. Since by default HDP lays these conf in separate directories.&lt;/p&gt;
-
-&lt;h2 id=&quot;next-move&quot;&gt;Next move&lt;/h2&gt;
-
-&lt;p&gt;Clearly above is not a fair competition. The environment is not the same, test data size is too small, etc.&lt;/p&gt;
-
-&lt;p&gt;However it showed that it is practical to migrate from MR to Spark, while some useful operators in Spark will save us quite a few codes.&lt;/p&gt;
-
-&lt;p&gt;So the next move for us is to setup a cluster, do the benchmark on real data set for both MR and Spark.&lt;/p&gt;
-
-&lt;p&gt;We will update the benchmark once we finished, please stay tuned.&lt;/p&gt;
-</description>
-        <pubDate>Wed, 09 Sep 2015 15:28:00 +0000</pubDate>
-        <link>http://kylin.apache.org/blog/2015/09/09/fast-cubing-on-spark/</link>
-        <guid isPermaLink="true">http://kylin.apache.org/blog/2015/09/09/fast-cubing-on-spark/</guid>
-        
-        
-        <category>blog</category>
         
       </item>
     



Mime
View raw message