kylin-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lid...@apache.org
Subject svn commit: r1765533 - in /kylin/site: blog/2016/10/18/new-nrt-streaming/index.html feed.xml
Date Wed, 19 Oct 2016 06:02:59 GMT
Author: lidong
Date: Wed Oct 19 06:02:59 2016
New Revision: 1765533

URL: http://svn.apache.org/viewvc?rev=1765533&view=rev
Log:
minor update on the blog

Modified:
    kylin/site/blog/2016/10/18/new-nrt-streaming/index.html
    kylin/site/feed.xml

Modified: kylin/site/blog/2016/10/18/new-nrt-streaming/index.html
URL: http://svn.apache.org/viewvc/kylin/site/blog/2016/10/18/new-nrt-streaming/index.html?rev=1765533&r1=1765532&r2=1765533&view=diff
==============================================================================
--- kylin/site/blog/2016/10/18/new-nrt-streaming/index.html (original)
+++ kylin/site/blog/2016/10/18/new-nrt-streaming/index.html Wed Oct 19 06:02:59 2016
@@ -205,15 +205,15 @@
   </li>
 </ul>
 
-<p>To overcome these limitations, the Apache Kylin team developed the new streaming
(<a href="https://issues.apache.org/jira/browse/KYLIN-1726">KYLIN-1726</a>) with
Kafka 0.10 API, it has been tested internally for some time, will release to public soon.</p>
+<p>To overcome these limitations, the Apache Kylin team developed the new streaming
(<a href="https://issues.apache.org/jira/browse/KYLIN-1726">KYLIN-1726</a>) with
Kafka 0.10, it has been tested internally for some time, will release to public soon.</p>
 
-<p>The new design is a perfect implementation under Kylin 1.5’s “Plug-in”
architecture: treat Kafka topic as a “Data Source” like Hive table, using an adapter
to extract the data to HDFS; the next steps are almost the same as from Hive. Figure 1 is
a high level architecture of the new design.</p>
+<p>The new design is a perfect implementation under Kylin 1.5’s “plug-in”
architecture: treat Kafka topic as a “Data Source” like Hive table, using an adapter
to extract the data to HDFS; the next steps are almost the same as other cubes. Figure 1 is
a high level architecture of the new design.</p>
 
 <p><img src="/images/blog/new-streaming.png" alt="Kylin New Streaming Framework
Architecture" /></p>
 
-<p>The adapter to read Kafka messages is modified from <a href="https://github.com/amient/kafka-hadoop-loader">kafka-hadoop-loader</a>,
which is open sourced under Apache License V2.0; it starts a mapper for each Kafka partition,
reading and then saving the messages to HDFS; in next steps Kylin will be able to leverage
existing framework like MR to do the processing, this makes the solution scalable and fault-tolerant.</p>
+<p>The adapter to read Kafka messages is modified from <a href="https://github.com/amient/kafka-hadoop-loader">kafka-hadoop-loader</a>,
the author Michal Harish open sourced it under Apache License V2.0; it starts a mapper for
each Kafka partition, reading and then saving the messages to HDFS; so Kylin will be able
to leverage existing framework like MR to do the processing, this makes the solution scalable
and fault-tolerant.</p>
 
-<p>To overcome the “data loss” problem, Kylin adds the start/end offset information
on each Cube segment, and then use the offsets as the partition value (no overlap is allowed);
this ensures no data be lost and 1 message be consumed at most once. To let the late/early
message can be queried, Cube segments allow overlap for the partition time dimension: Kylin
will scan all segments which include the queried time. Figure 2 illurates this.</p>
+<p>To overcome the “data loss” limitation, Kylin adds the start/end offset
information on each Cube segment, and then use the offsets as the partition value (no overlap
allowed); this ensures no data be lost and 1 message be consumed at most once. To let the
late/early message can be queried, Cube segments allow overlap for the partition time dimension:
each segment has a “min” date/time and a “max” date/time; Kylin will scan
all segments which matched with the queried time scope. Figure 2 illurates this.</p>
 
 <p><img src="/images/blog/offset-as-partition-value.png" alt="Use Offset to Cut
Segments" /></p>
 
@@ -227,23 +227,25 @@
   <li>Add REST API to check and fill the segment holes</li>
 </ul>
 
-<p>The integration test result shows big improvements than the previous version:</p>
+<p>The integration test result is promising:</p>
 
 <ul>
   <li>Scalability: it can easily process up to hundreds of million records in one build;</li>
-  <li>Flexibility: trigger the build at any time with the frequency you want, e.g:
every 5 minutes in day and every hour in night; Kylin manages the offsets so it can resume
from the last position;</li>
-  <li>Stability: pretty stable, no OutOfMemory error;</li>
+  <li>Flexibility: you can trigger the build at any time, with the frequency you want;
for example: every 5 minutes in day time but every hour in night time, and even pause when
you need do a maintenance; Kylin manages the offsets so it can automatically continue from
the last position;</li>
+  <li>Stability: pretty stable, no OutOfMemoryError;</li>
   <li>Management: user can check all jobs’ status through Kylin’s “Monitor”
page or REST API;</li>
   <li>Build Performance: in a testing cluster (8 AWS instances to consume Twitter streams),
10 thousands arrives per second, define a 9-dimension cube with 3 measures; when build interval
is 2 mintues, the job finishes in around 3 minutes; if change interval to 5 mintues, build
finishes in around 4 minutes;</li>
 </ul>
 
-<p>Here are a couple of screenshots in this test:<br />
+<p>Here are a couple of screenshots in this test, we may compose it as a step-by-step
tutorial in the future:<br />
 <img src="/images/blog/streaming-monitor.png" alt="Streaming Job Monitoring" /></p>
 
 <p><img src="/images/blog/streaming-adapter.png" alt="Streaming Adapter" /></p>
 
 <p><img src="/images/blog/streaming-twitter.png" alt="Streaming Twitter Sample"
/></p>
 
+<p>In short, this is a more robust Near Real Time Streaming OLAP solution (compared
with the previous version). Nextly, the Apache Kylin team will move toward a Real Time engine.</p>
+
   </article>
 
 </div>

Modified: kylin/site/feed.xml
URL: http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1765533&r1=1765532&r2=1765533&view=diff
==============================================================================
--- kylin/site/feed.xml (original)
+++ kylin/site/feed.xml Wed Oct 19 06:02:59 2016
@@ -19,8 +19,8 @@
     <description>Apache Kylin Home</description>
     <link>http://kylin.apache.org/</link>
     <atom:link href="http://kylin.apache.org/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Tue, 18 Oct 2016 07:59:25 -0700</pubDate>
-    <lastBuildDate>Tue, 18 Oct 2016 07:59:25 -0700</lastBuildDate>
+    <pubDate>Wed, 19 Oct 2016 06:59:18 -0700</pubDate>
+    <lastBuildDate>Wed, 19 Oct 2016 06:59:18 -0700</lastBuildDate>
     <generator>Jekyll v2.5.3</generator>
     
       <item>
@@ -44,15 +44,15 @@
   &lt;/li&gt;
 &lt;/ul&gt;
 
-&lt;p&gt;To overcome these limitations, the Apache Kylin team developed the new streaming
(&lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1726&quot;&gt;KYLIN-1726&lt;/a&gt;)
with Kafka 0.10 API, it has been tested internally for some time, will release to public soon.&lt;/p&gt;
+&lt;p&gt;To overcome these limitations, the Apache Kylin team developed the new streaming
(&lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1726&quot;&gt;KYLIN-1726&lt;/a&gt;)
with Kafka 0.10, it has been tested internally for some time, will release to public soon.&lt;/p&gt;
 
-&lt;p&gt;The new design is a perfect implementation under Kylin 1.5’s “Plug-in”
architecture: treat Kafka topic as a “Data Source” like Hive table, using an adapter
to extract the data to HDFS; the next steps are almost the same as from Hive. Figure 1 is
a high level architecture of the new design.&lt;/p&gt;
+&lt;p&gt;The new design is a perfect implementation under Kylin 1.5’s “plug-in”
architecture: treat Kafka topic as a “Data Source” like Hive table, using an adapter
to extract the data to HDFS; the next steps are almost the same as other cubes. Figure 1 is
a high level architecture of the new design.&lt;/p&gt;
 
 &lt;p&gt;&lt;img src=&quot;/images/blog/new-streaming.png&quot; alt=&quot;Kylin
New Streaming Framework Architecture&quot; /&gt;&lt;/p&gt;
 
-&lt;p&gt;The adapter to read Kafka messages is modified from &lt;a href=&quot;https://github.com/amient/kafka-hadoop-loader&quot;&gt;kafka-hadoop-loader&lt;/a&gt;,
which is open sourced under Apache License V2.0; it starts a mapper for each Kafka partition,
reading and then saving the messages to HDFS; in next steps Kylin will be able to leverage
existing framework like MR to do the processing, this makes the solution scalable and fault-tolerant.&lt;/p&gt;
+&lt;p&gt;The adapter to read Kafka messages is modified from &lt;a href=&quot;https://github.com/amient/kafka-hadoop-loader&quot;&gt;kafka-hadoop-loader&lt;/a&gt;,
the author Michal Harish open sourced it under Apache License V2.0; it starts a mapper for
each Kafka partition, reading and then saving the messages to HDFS; so Kylin will be able
to leverage existing framework like MR to do the processing, this makes the solution scalable
and fault-tolerant.&lt;/p&gt;
 
-&lt;p&gt;To overcome the “data loss” problem, Kylin adds the start/end
offset information on each Cube segment, and then use the offsets as the partition value (no
overlap is allowed); this ensures no data be lost and 1 message be consumed at most once.
To let the late/early message can be queried, Cube segments allow overlap for the partition
time dimension: Kylin will scan all segments which include the queried time. Figure 2 illurates
this.&lt;/p&gt;
+&lt;p&gt;To overcome the “data loss” limitation, Kylin adds the start/end
offset information on each Cube segment, and then use the offsets as the partition value (no
overlap allowed); this ensures no data be lost and 1 message be consumed at most once. To
let the late/early message can be queried, Cube segments allow overlap for the partition time
dimension: each segment has a “min” date/time and a “max” date/time; Kylin
will scan all segments which matched with the queried time scope. Figure 2 illurates this.&lt;/p&gt;
 
 &lt;p&gt;&lt;img src=&quot;/images/blog/offset-as-partition-value.png&quot;
alt=&quot;Use Offset to Cut Segments&quot; /&gt;&lt;/p&gt;
 
@@ -66,22 +66,24 @@
   &lt;li&gt;Add REST API to check and fill the segment holes&lt;/li&gt;
 &lt;/ul&gt;
 
-&lt;p&gt;The integration test result shows big improvements than the previous version:&lt;/p&gt;
+&lt;p&gt;The integration test result is promising:&lt;/p&gt;
 
 &lt;ul&gt;
   &lt;li&gt;Scalability: it can easily process up to hundreds of million records
in one build;&lt;/li&gt;
-  &lt;li&gt;Flexibility: trigger the build at any time with the frequency you want,
e.g: every 5 minutes in day and every hour in night; Kylin manages the offsets so it can resume
from the last position;&lt;/li&gt;
-  &lt;li&gt;Stability: pretty stable, no OutOfMemory error;&lt;/li&gt;
+  &lt;li&gt;Flexibility: you can trigger the build at any time, with the frequency
you want; for example: every 5 minutes in day time but every hour in night time, and even
pause when you need do a maintenance; Kylin manages the offsets so it can automatically continue
from the last position;&lt;/li&gt;
+  &lt;li&gt;Stability: pretty stable, no OutOfMemoryError;&lt;/li&gt;
   &lt;li&gt;Management: user can check all jobs’ status through Kylin’s
“Monitor” page or REST API;&lt;/li&gt;
   &lt;li&gt;Build Performance: in a testing cluster (8 AWS instances to consume Twitter
streams), 10 thousands arrives per second, define a 9-dimension cube with 3 measures; when
build interval is 2 mintues, the job finishes in around 3 minutes; if change interval to 5
mintues, build finishes in around 4 minutes;&lt;/li&gt;
 &lt;/ul&gt;
 
-&lt;p&gt;Here are a couple of screenshots in this test:&lt;br /&gt;
+&lt;p&gt;Here are a couple of screenshots in this test, we may compose it as a step-by-step
tutorial in the future:&lt;br /&gt;
 &lt;img src=&quot;/images/blog/streaming-monitor.png&quot; alt=&quot;Streaming
Job Monitoring&quot; /&gt;&lt;/p&gt;
 
 &lt;p&gt;&lt;img src=&quot;/images/blog/streaming-adapter.png&quot; alt=&quot;Streaming
Adapter&quot; /&gt;&lt;/p&gt;
 
 &lt;p&gt;&lt;img src=&quot;/images/blog/streaming-twitter.png&quot; alt=&quot;Streaming
Twitter Sample&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;In short, this is a more robust Near Real Time Streaming OLAP solution (compared
with the previous version). Nextly, the Apache Kylin team will move toward a Real Time engine.&lt;/p&gt;
 </description>
         <pubDate>Tue, 18 Oct 2016 10:30:00 -0700</pubDate>
         <link>http://kylin.apache.org/blog/2016/10/18/new-nrt-streaming/</link>



Mime
View raw message