kylin-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shaofeng...@apache.org
Subject [11/45] incubator-kylin git commit: Add blog for hybrid model
Date Mon, 26 Oct 2015 09:07:57 GMT
Add blog for hybrid model


Project: http://git-wip-us.apache.org/repos/asf/incubator-kylin/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-kylin/commit/be9c4846
Tree: http://git-wip-us.apache.org/repos/asf/incubator-kylin/tree/be9c4846
Diff: http://git-wip-us.apache.org/repos/asf/incubator-kylin/diff/be9c4846

Branch: refs/heads/master
Commit: be9c48464af29fff2f275e6d187d5ada018dc789
Parents: c611d7e
Author: shaofengshi <shaofengshi@apache.org>
Authored: Fri Sep 25 23:30:54 2015 +0800
Committer: shaofengshi <shaofengshi@apache.org>
Committed: Fri Sep 25 23:31:31 2015 +0800

----------------------------------------------------------------------
 website/_dev/dev_env.md                        |   5 +-
 website/_docs/index.md                         |   4 +-
 website/_posts/blog/2015-09-22-hybrid-model.md | 128 ++++++++++++++++++++
 website/images/blog/hybrid-model.png           | Bin 0 -> 118183 bytes
 4 files changed, 132 insertions(+), 5 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-kylin/blob/be9c4846/website/_dev/dev_env.md
----------------------------------------------------------------------
diff --git a/website/_dev/dev_env.md b/website/_dev/dev_env.md
index 20bb69f..a14fc2f 100644
--- a/website/_dev/dev_env.md
+++ b/website/_dev/dev_env.md
@@ -75,8 +75,9 @@ Run a end-to-end cube building test, these special test cases will populate
some
 It might take a while (maybe one hour), please keep patient.
  
 {% highlight Groff markup %}
-	mvn test -Dtest=org.apache.kylin.job.BuildCubeWithEngineTest -DfailIfNoTests=false -P sandbox
-	mvn test -Dtest=org.apache.kylin.job.BuildIIWithEngineTest -DfailIfNoTests=false -P sandbox
+	mvn test -Dtest=org.apache.kylin.job.BuildCubeWithEngineTest -DfailIfNoTests=false -Dhdp.version=<hdp-version>
-P sandbox
+	
+	mvn test -Dtest=org.apache.kylin.job.BuildIIWithEngineTest -DfailIfNoTests=false -Dhdp.version=<hdp-version>
-P sandbox
 {% endhighlight %}
 	
 Run other tests, the end-to-end cube building test is exclueded

http://git-wip-us.apache.org/repos/asf/incubator-kylin/blob/be9c4846/website/_docs/index.md
----------------------------------------------------------------------
diff --git a/website/_docs/index.md b/website/_docs/index.md
index 64e1295..c91054e 100644
--- a/website/_docs/index.md
+++ b/website/_docs/index.md
@@ -41,9 +41,7 @@ Advanced Topics
 
 1.[Check Kylin Metadata Store](howto/howto_backup_metadata.html)
 
-2.[Clean/Export Kylin HBase data](howto/howto_backup.html)
-
-3.[Advanced settings of Kylin environment](install/advance_settings.html)
+2.[Advanced settings of Kylin environment](install/advance_settings.html)
 
 ---
 

http://git-wip-us.apache.org/repos/asf/incubator-kylin/blob/be9c4846/website/_posts/blog/2015-09-22-hybrid-model.md
----------------------------------------------------------------------
diff --git a/website/_posts/blog/2015-09-22-hybrid-model.md b/website/_posts/blog/2015-09-22-hybrid-model.md
new file mode 100644
index 0000000..1e0f4c7
--- /dev/null
+++ b/website/_posts/blog/2015-09-22-hybrid-model.md
@@ -0,0 +1,128 @@
+---
+layout: post-blog
+title:  "Hybrid Model in Apache Kylin 1.0"
+date:   2015-09-25 16:00:00
+author: Shaofeng Shi
+categories: blog
+---
+
+**Apache Kylin v1.0 introduces a new realization "hybrid model" (also called "dynamic model");
This post introduces the concept and how to define it.**
+
+# Problem
+
+For incoming SQL queries, Kylin picks ONE (and only ONE) realization to serve the query;
Before the "hybrid", there is only one type of realization open for user: Cube. That to say,
only 1 Cube would be selected to answer a query;
+
+Now let's start with a sample case; Assume user has a Cube called "Cube_V1", it has been
built for a couple of months; Now the user wants to add new dimension or metrics to fulfill
their business need; So he created a new Cube named "Cube_V2"; 
+
+Due to some reason user wants to keep the data of "Cube_V1", and expects to build "Cube_V2"
from the end date of "Cube_V1"; The possible reasons include:
+
+* History source data has been dropped from Hadoop, not possible to build "Cube_V2" from
the very beginning;
+* The cube is large, rebuilding takes very long time;
+* New dimension/metrics is only feasible for the new date, or user feels fine if they were
absent for old cube; etc.
+
+For some queries that don't use the new measure and metrics, user hopes both "Cube_V1" and
"Cube_V2" can be scanned to get a full result, such as "select count(*)...", "select sum(price)...";
With such a background, the "hybrid model" is introduced in Kylin;
+
+## Hybrid Model
+
+Hybrid model is a new realization which is a composite of one or multiple other realizations
(cubes); See the figure below.
+
+![]( /images/blog/hybrid-model.png)
+
+Hybrid doesn't have its real storage; It is just like a virtual database view over tables;
It acts as a delegator who delegates the requests to its children realizations.
+
+## How to add a Hybrid model
+
+As there is no UI for creating/editing hybrid model, if have the need, you need manually
edit Kylin metadata;
+
+### Step 1: Take a backup of kylin metadata store 
+
+```
+export KYLIN_HOME="/path/to/kylin"
+
+$KYLIN_HOME/bin/metastore.sh backup
+
+```
+
+A backup folder will be created, assume it is $KYLIN_HOME/metadata_backup/2015-09-25/
+ 
+### Step 2: Create sub-folder "hybrid" in the metadata folder,
+
+```
+mkdir -p $KYLIN_HOME/metadata_backup/2015-09-25/hybrid
+```
+
+### Step 3: Create a hybrid json file: 
+
+```
+vi $KYLIN_HOME/metadata_backup/2015-09-25/hybrid/my_hybrid.json
+
+```
+
+Input content like this:
+
+```
+{
+  "uuid": "9iiu8590-64b6-4367-8fb5-7500eb95fd9c",
+  "name": "my_hybrid",
+  "realizations": [
+    {
+           "type": "CUBE",
+           "realization": "Cube_V1"
+    },
+    {
+            "type": "CUBE",
+            "realization": "Cube_V2"
+    }
+  ]
+}
+
+```
+Here "Cube_V1" and "Cube_V2" are the cubes that you want to combine.
+
+
+### Step 4: Add hybrid model to project
+
+Open project json file (for example project "default") with text editor:
+
+```
+vi $KYLIN_HOME/metadata_backup/2015-09-25/project/default.json
+
+```
+
+In the "realizations" array, add one entry like:
+
+```
+    {
+      "name": "my_hybrid",
+      "type": "HYBRID",
+      "realization": "my_hybrid"
+    }
+```
+
+### Step 5: Upload the metadata:
+
+```
+  $KYLIN_HOME/bin/metastore.sh restore $KYLIN_HOME/metadata_backup/2015-09-25/
+
+```
+
+### Step 6: Reload metadata
+
+Restart Kylin server, or click "Reload metadata" in the "Admin" tab on Kylin web UI to load
the changes; Ideally the hybrid will start to work; You can do some verifications.
+
+## FAQ:
+
+**Question 1**: when will hybrid be selected to serve query?
+If one of the cube can answer the query, the hybrid which has it as a child will be selected;
+
+**Question 2**: how hybrid to answer the query?
+Hybrid will delegate the query to each of its child realization (if it is capable); And then
return all the results to query engine; Query engine will aggregate before return to user;
+
+**Question 3**: will hybrid check the data duplication?
+No; it depends on you to ensure the cubes in a hybrid don't have date/time range duplication;
For example, the "Cube_V1" is ended at 2015-9-20 (including), the "Cube_V2" should start from
2015-9-21 or later;
+
+**Question 4**: will hybrid restrict the children cubes having the same data model?
+No; hybrid doesn't check the cube's fact/lookup tables and join conditions at all;
+
+**Question 5**: can hybrid have another hybrid as child?
+No; didn't see the need; so far it assumes all children are Cubes;

http://git-wip-us.apache.org/repos/asf/incubator-kylin/blob/be9c4846/website/images/blog/hybrid-model.png
----------------------------------------------------------------------
diff --git a/website/images/blog/hybrid-model.png b/website/images/blog/hybrid-model.png
new file mode 100644
index 0000000..5fd476c
Binary files /dev/null and b/website/images/blog/hybrid-model.png differ


Mime
View raw message