samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thunder Stumpges <tstump...@ntent.com>
Subject RE: Old style "low level" Tasks with alternative deployment model(s)
Date Thu, 15 Mar 2018 04:24:52 GMT
Wow, what great timing, and what a great thread! I definitely have some good starters to go
off of here.

If it is helpful for everyone, once I get the low-level API + ZkJobCoordinator + Docker +
K8s working, I'd be glad to formulate an additional sample for hello-samza. 

One thing I'm still curious about, is what are the drawbacks or complexities of leveraging
the Kafka High-level consumer + PassthroughJobCoordinator in a stand-alone setup like this?
We do have Zookeeper (because of kafka) so I think either would work. The Kafka High-level
consumer comes with other nice tools for monitoring offsets, lag, etc....

Thanks guys!
-Thunder

-----Original Message-----
From: Tom Davis [mailto:tom@recursivedream.com] 
Sent: Wednesday, March 14, 2018 17:50
To: dev@samza.apache.org
Subject: Re: Old style "low level" Tasks with alternative deployment model(s)

Hey there!

You are correct that this is focused on the higher-level API but doesn't preclude using the
lower-level API. I was at the same point you were not long ago, in fact, and had a very productive
conversation on the list:
you should look for "Question about custom StreamJob/Factory" in the list archive for the
past couple months.

I'll quote Jagadish Venkatraman from that thread:

> For the section on the low-level API, can you use 
> LocalApplicationRunner#runTask()? It basically creates a new 
> StreamProcessor and runs it. Remember to provide task.class and set it 
> to your implementation of StreamTask or AsyncStreamTask. Please note 
> that this is an evolving API and hence, subject to change.

I ended up just switching to the high-level API because I don't have any existing Tasks and
the Kubernetes story is a little more straight forward there (there's only one container/configuration
to deploy).

Best,

Tom

Thunder Stumpges <tstumpges@ntent.com> writes:

> Hi all,
>
> We are using Samza (0.12.0) in about 2 dozen jobs implementing several 
> processing pipelines. We have also begun a significant move of other 
> services within our company to Docker/Kubernetes. Right now our 
> Hadoop/Yarn cluster has a mix of stream and batch "Map Reduce" jobs (many reporting and
other batch processing jobs). We would really like to move our stream processing off of Hadoop/Yarn
and onto Kubernetes.
>
> When I just read about some of the new progress in .13 and .14 I got 
> really excited! We would love to have our jobs run as simple libraries 
> in our own JVM, and use the Kafka High-Level-Consumer for partition distribution and
such. This would let us "dockerfy" our application and run/scale in kubernetes.
>
> However as I read it, this new deployment model is ONLY for the 
> new(er) High Level API, correct? Is there a plan and/or resources for 
> adapting this back to existing low-level tasks ? How complicated of a task is that? Do
I have any other options to make this transition easier?
>
> Thanks in advance.
> Thunder

Mime
View raw message