samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Riccomini <criccom...@linkedin.com.INVALID>
Subject Re: Store changelog stream configuration and use
Date Fri, 07 Nov 2014 21:29:53 GMT
Hey Jordan,

Oops, sorry, email client got overzealous. Let me try again:

> Is it acceptable to have other Samza jobs listen to the changelog
>streams?

Yes, in fact, it's encouraged! This is one of the reasons that we don't
impose any message format not he change log message payload. The change
log streams are just normal streams with normal serialized messages that
can be consumed from downstream jobs.

> How are the changelog topics meant to be partitioned? It seems like
>Samza creates them automatically with just a single partition, but would
>it make more sense to have them partitioned based on the number of
>partitions that the job that owns the store has?

Yes, their partition count should equal the partition count of the job.
You have good timing. Naveen is actually in the process of adding
auto-create for change logs right now:


  https://issues.apache.org/jira/browse/SAMZA-226

> I let Samza autocreate the topic, for instance - perhaps I need to
>create it by hand with some special parameters?

I think you'll need to create the topic manually. It's not actually Samza
creating the topic (since we don't have that feature yet), it's Kafka
doing it. And Kafka doesn't know how many partitions are required, so it
uses the default (configurable on the broker). It's likely that you have
only 1 partition on your changelog topic, but more than one on your input
stream.

Cheers,
Chris

On 11/7/14 1:25 PM, "Chris Riccomini" <criccomini@linkedin.com> wrote:

>Hey Jordan,
>
>> Is it acceptable to have other Samza jobs listen to the changelog
>>streams?
>
>Yes, in fact, it's encouraged! This is one of the reasons that we don't
>impose any message format not he change log message payload. The change
>log streams are just normal streams with normal serialized messages that
>can be consumed from downstream jobs.
>
>> How are the changelog topics meant to be partitioned? It seems like
>>Samza creates them automatically with just a single partition, but would
>>it make more sense to have them partitioned based on the number of
>>partitions that the job that owns the store has?
>
>Yes, their partition count should equal the partition count of the job.
>You have good timing. Naveen is actually in the process of adding
>auto-create for change logs right now:
>
>
>Cheers,
>Chris
>
>On 11/7/14 1:13 PM, "Jordan Lewis" <jordan@knewton.com> wrote:
>
>>Hi,
>>
>>I have a few questions about the Store changelog Kafka topics.
>>
>>1. Is it acceptable to have other Samza jobs listen to the changelog
>>streams? It seems like they're named automatically by the system, so I'm
>>not sure whether they're designed to be used except internally for Store
>>replays on node spin up. On the other hand, it seems like a really
>>desirable and common use case to have jobs listen to changes in a store.
>>
>>2. How are the changelog topics meant to be partitioned? It seems like
>>Samza creates them automatically with just a single partition, but would
>>it
>>make more sense to have them partitioned based on the number of
>>partitions
>>that the job that owns the store has?
>>
>>Finally, I've been getting an exception when trying to use changelogged
>>stores. It is the same exception as is mentioned in SAMZA-169
>><https://issues.apache.org/jira/browse/SAMZA-169>, which is resolved. I
>>wonder if I'm missing some important bootstrap steps? I let Samza
>>autocreate the topic, for instance - perhaps I need to create it by hand
>>with some special parameters?
>>
>>Thanks,
>>Jordan Lewis
>


Mime
View raw message