samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yan Fang <yanfang...@gmail.com>
Subject Re: Samza ZooKeeper Connection Inquiry
Date Thu, 04 Jun 2015 01:28:24 GMT
Hi Chas,

The actually connection does not happen in the StreamTask, but the
Consumer/Producer layer. One way of understanding this is that,

* Consumers are run in a separate thread, which keeps the connection to the
server (kafka broker, wiki api, etch) and fetches messages from them;
* StreamTask has the logic you want to have for the messages;
* TaskInstance gets messages from the Consumers and then run StreamTask to
process them.

This description is not very precise, but the basic workflow is correct.

BTW, do not be confused by the main function in the WikipediaFeed, that was
for *testing *(sorry about that)...If you are looking for the "real" main
function, one is the SamzaContainer
<https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala>.
Though it's not necessary to understand the Samza in order to use the
Samza. :)

Thanks,

Fang, Yan
yanfang724@gmail.com

On Wed, Jun 3, 2015 at 2:35 PM, Chas Pezanko <chas.pezanko@evariant.com>
wrote:

> Hello,
>
> So I am currently working on learning Samza for work and there is a
> question that I am looking for some direction through. In a Kafka
> Consumer/Producer in order to connect to ZooKeeper you can simply do so
> through a localhost port configuration and that creates a static connection
> for a topic. Through the Hello Samza example the connection to a stream is
> done through and IRC Channel. I understand the function and requirement of
> a StreamTask in Samza in order to connect to a System Stream, however I am
> a little confused on the proper configuration of connecting to a local
> Zookeeper in order to consume and produce data across a stream.
>
> My understanding is that the actually connection to the SystemStream is
> specifically taking place in the StreamTask so I am assuming that the
> configuration would take place in the main class however I don't quite
> understand how the connection works through the main function. For example,
> the Wikipedia example holds the main function in the WikipediaFeed class,
> separate from the StreamTask.
>
> Any direction or assistance would be greatly appreciative as this is my
> first time working with this type of technology. Thank you.
>
> -Chas
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message