kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andras Nagy <>
Subject Re: Re: Kylin streaming questions
Date Mon, 24 Jun 2019 15:28:40 GMT
Dear Ma,

Thanks for your reply.

Slightly related to my original question on the hybrid model, I was
wondering if it's possible to combine a batch and a streaming cube. I
realized this is not possible, as a hybrid model can only be created from
cubes of the same model (and a model points to either a batch or a
streaming datasource).

The usecase would be this:
- we have a large amount of streaming data in Kafka that we would like to
process with Kylin streaming
- Kafka retention is only a few days, so if we need to change anything in
the cubes (e.g. introduce a new metric or dimension which has been present
in the events, but not in the cube definition), we can only reprocess a few
days worth of data in the streaming model
- the raw events are also written to a data lake for long-term storage
- the data written to the data lake could be used to feed the historic data
into a batch kylin model (and cubes)
- I'm looking for a way to combine these, so if we want to change anything
in the cubes, we can recalculate them for the historic data as well

Is there a way to achieve this with current Kylin? (Without implementing a
custom query layer that combines the two cubes.)

Best regards,

On Fri, Jun 14, 2019 at 6:43 AM Ma Gang <> wrote:

> Hi Andras,
> Currently it doesn't support consume from specified offsets, only support
> consume from startOffset or latestOffset, if you want to consume from
> startOffset, you need to set the
> configuration: to false in the cube's
> overrides page.
> If you do need to start from specified offsets, please create a jira
> request, but I think it is hard for user to know what's the offsets should
> be set for all partitions.
> At 2019-06-13 22:34:59, "Andras Nagy" <>
> wrote:
> Dear Ma,
> Thank you very much!
> >1)yes, you can specify a configuration in the new cube, to consume data
> from start offset
> That is, an offset value for each partition of the topic? That would be
> good - could you please point me where to do this in practice, or point me
> to what I should read? (I haven't found it on the cube designer UI -
> perhaps this is something that's only available on the API?)
> Many thanks,
> Andras
> On Thu, Jun 13, 2019 at 1:14 PM Ma Gang <> wrote:
>> Hi Andras,
>> 1)yes, you can specify a configuration in the new cube, to consume data
>> from start offset
>> 2)It should work, but I haven't tested it yet
>> 3)as I remember, currently we use Kafka 1.0 client library, so it is
>> better to use the version later, I'm sure that the version before 0.9.0
>> cannot work, but not sure 0.9.x can work or not
>> Ma Gang
>> 邮箱
>> <>
>> 签名由 网易邮箱大师 <>
>> On 06/13/2019 18:01, Andras Nagy <> wrote:
>> Greetings,
>> I have a few questions related to the new streaming (real-time OLAP)
>> implementation.
>> 1) Is there a way to have data reprocessed from kafka? E.g. I change a
>> cube definition and drop the cube (or add a new cube definition) and want
>> to have data that is still available on kafka to be reprocessed to build
>> the changed cube (or new cube)? Is this possible?
>> 2) Does the hybrid model work with streaming cubes (to combine two cubes)?
>> 3) What is minimum kafka version required? The tutorial asks to install
>> Kafka 1.0, is this the minimum required version?
>> Thank you very much,
>> Andras

View raw message