samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Riccomini <criccom...@linkedin.com.INVALID>
Subject Re: Questions on topic creation
Date Tue, 14 Oct 2014 22:23:26 GMT
Hey Roger,

> Do I need to manually create the KV store changelog topic?

Yes, unfortunately you do need to create it manually at the moment.

> I saw this ticket (https://issues.apache.org/jira/browse/SAMZA-226) but
>it looks like it's still open.

Yep, that's the ticket to fix the above issue. :) It is indeed still open.

> Do checkpoint topics get created?

Yes.

> Are jobs tasks assigned to partitions of a shared checkpoint topic or do
>they each get their own checkpoint topic?

In 0.7.0, each task got its own partition. In 0.8.0 (post-SAMZA-123), the
checkpoint topic is single partition, and all tasks in one job share this
partition. Note that jobs each still have their own checkpoint topics. The
SAMZA-123 JIRA has a design doc that Jakob wrote, which describes how the
checkpoint topic works. For 0.7.0, the legacy checkpoint topics, you can
find docs here:

  
http://samza.incubator.apache.org/learn/documentation/0.7.0/container/check
pointing.html


> Should I proceed with this version or would it make life easier to use
>trunk or something closer to 0.8.0?

I would recommend using 0.8.0 (master). We've not yet released it, but
that's mostly since we're waiting on SAMZA-236. We've been running 0.8.0
at LinkedIn for several large jobs (600k-800k msgs/sec), and it's been
pretty solid. It also has a ton of performance improvements, an new UI,
etc.

> Anything else I need to watch out for?

If you're already running with 0.7.0, you'll either need to abandon your
checkpoints, or wait for SAMZA-354. The 0.8.0 checkpoint topic changes
were backwards incompatible, and thus we are adding an auto-migration
feature, which hasn't yet been written (though it's being worked on right
now).

Cheers,
Chris

On 10/14/14 1:04 PM, "Roger Hoover" <roger.hoover@gmail.com> wrote:

>Hi all,
>
>I want to deploy a Samza job in a pre-production environment and need to
>figure out how to handle configuration of the various topics.  In
>particular, I want to make sure topics like the KV store changelog are
>configured to be compacted so that data isn't lost over time.
>
>Do I need to manually create the KV store changelog topic?  I saw this
>ticket (https://issues.apache.org/jira/browse/SAMZA-226) but it looks like
>it's still open.
>
>Do checkpoint topics get created?  If not, what does the
>"task.checkpoint.replication.factor" configuration do?  Are jobs tasks
>assigned to partitions of a shared checkpoint topic or do they each get
>their own checkpoint topic?
>
>So far I've developed my proof of concept job with 0.7.0.  Should I
>proceed
>with this version or would it make life easier to use trunk or something
>closer to 0.8.0?
>
>Anything else I need to watch out for?
>
>Thanks,
>
>Roger


Mime
View raw message