Return-Path: X-Original-To: apmail-samza-dev-archive@minotaur.apache.org Delivered-To: apmail-samza-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5E99E185C6 for ; Mon, 30 Nov 2015 20:32:32 +0000 (UTC) Received: (qmail 61972 invoked by uid 500); 30 Nov 2015 20:32:32 -0000 Delivered-To: apmail-samza-dev-archive@samza.apache.org Received: (qmail 61908 invoked by uid 500); 30 Nov 2015 20:32:32 -0000 Mailing-List: contact dev-help@samza.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@samza.apache.org Delivered-To: mailing list dev@samza.apache.org Received: (qmail 61896 invoked by uid 99); 30 Nov 2015 20:32:31 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Nov 2015 20:32:31 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 7893B180A88 for ; Mon, 30 Nov 2015 20:32:31 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.099 X-Spam-Level: X-Spam-Status: No, score=-0.099 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id LKtJjVDh45ie for ; Mon, 30 Nov 2015 20:32:21 +0000 (UTC) Received: from mail-yk0-f182.google.com (mail-yk0-f182.google.com [209.85.160.182]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 3DE1F20270 for ; Mon, 30 Nov 2015 20:32:20 +0000 (UTC) Received: by ykdr82 with SMTP id r82so198315131ykd.3 for ; Mon, 30 Nov 2015 12:32:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=Nz9Jtz0te9ic+xHoo0O4X1Zy4iH+V5d+r/NQYlcANTs=; b=hg2vO/rAb1EHXtP++makDmV+Qs82wMjsvD8MaYr1evT/neFVdRp3i6VEyA/PNgm00z mdq12oOD98M+BrrKPv1JpnVb/tWsmyWeikb7mdjJf2J1Kt90/1ZKQWYvWdHxH33t3gvO Taicv0H8Mnu5cx4nhDFA1tAyqFAaV0uBBwDsvHRPozXvX4XBDaQ4g6I+3msWckfHSbGT rAXRdm6Al2tQgeag02IQGR5KJ+VftljGH+LXASR08a7f31H+1gYarU7z11SEI5XI6r/R 4Tp1L4RqaaaH4c3FFJtMtUUu0o4aRi+va1H6esV87MyHd9sMpNthDZz//tS/99KVCP7c SuEw== X-Received: by 10.129.99.196 with SMTP id x187mr54133064ywb.38.1448915539237; Mon, 30 Nov 2015 12:32:19 -0800 (PST) MIME-Version: 1.0 Received: by 10.37.78.3 with HTTP; Mon, 30 Nov 2015 12:31:39 -0800 (PST) In-Reply-To: References: From: Jakob Homan Date: Mon, 30 Nov 2015 12:31:39 -0800 Message-ID: Subject: Re: Executing Samza jobs natively in Kubernetes To: dev@samza.apache.org Content-Type: text/plain; charset=UTF-8 Hey Elias- This is awesome work. Would be interested in opening JIRAs for the changes you need so we can start to process them? Thanks, Jakob On 30 November 2015 at 12:18, Roger Hoover wrote: > Awesome. Thanks. > > On Sun, Nov 29, 2015 at 3:25 PM, Elias Levy > wrote: > >> Roger, >> >> You are welcomed. If you want to experiment, you can use my hello samza >> Docker image. >> >> On Sun, Nov 29, 2015 at 12:19 PM, Roger Hoover >> wrote: >> >> > Elias, >> > >> > I would also love to be able to deploy Samza on Kubernetes with dynamic >> > task management. Thanks for sharing this. It may be a good interim >> > solution. >> > >> > Roger >> > >> > On Sun, Nov 29, 2015 at 11:18 AM, Elias Levy < >> fearsome.lucidity@gmail.com> >> > wrote: >> > >> > > I've been exploring Samza for stream processing as well as Kubernetes >> as >> > a >> > > container orchestration system and I wanted to be able to use one with >> > the >> > > other. The prospect of having to execute YARN either along side or on >> > top >> > > of Kubernetes did not appeal to me, so I developed a KubernetesJob >> > > implementation of SamzaJob. >> > > >> > > You can find the details at >> > https://github.com/eliaslevy/samza_kubernetes, >> > > but in summary KubernetesJob executes and generates a serialized >> > JobModel. >> > > Instead of interacting with Kubernetes directly to create the >> > > SamzaContainers (as the YarnJob's SamzaApplicationMaster may do with >> the >> > > YARN RM), it output a config YAML file that can be used to create the >> > > SamzaContainers in Kubernetes by using Resource Controllers. For this >> > you >> > > require to package your job as a Docker image. You can reach the >> README >> > at >> > > the above repo for details. >> > > >> > > A few observations: >> > > >> > > It would be useful if SamzaContainer accepted the JobModel via an >> > > environment variable. Right not it expects a URL to download it >> from. I >> > > get around this by using a entry point script that copies the model >> from >> > an >> > > environment variable into a file, then passes a file URL to >> > SamzaContainer. >> > > >> > > SamzaContainer doesn't allow you to configure the JMX port. It >> selects a >> > > port at random from the ephemeral range as it expects to execute in >> YARN >> > > where a static port could result in a conflict. This is not the case >> in >> > > Kubernetes where each Pod (i.e. SamzaContainer) is given its own IP >> > > address. >> > > >> > > This implementation doesn't provide a Samza dashboard, which in the >> YARN >> > > implementation is hosted in the Application Master. There didn't seem >> to >> > > be much value provided by the dashboard that is not already provided by >> > the >> > > Kubernetes tools for monitoring pods. >> > > >> > > I've successfully executed the hello-samza jobs in Kubernetes: >> > > >> > > $ kubectl get po >> > > NAME READY STATUS RESTARTS AGE >> > > kafka-1-jjh8n 1/1 Running 0 2d >> > > kafka-2-buycp 1/1 Running 0 2d >> > > kafka-3-tghkp 1/1 Running 0 2d >> > > wikipedia-feed-0-4its2 1/1 Running 0 1d >> > > wikipedia-parser-0-l0onv 1/1 Running 0 17h >> > > wikipedia-parser-1-crrxh 1/1 Running 0 17h >> > > wikipedia-parser-2-1c5nn 1/1 Running 0 17h >> > > wikipedia-stats-0-3gaiu 1/1 Running 0 16h >> > > wikipedia-stats-1-j5qlk 1/1 Running 0 16h >> > > wikipedia-stats-2-2laos 1/1 Running 0 16h >> > > zookeeper-1-1sb4a 1/1 Running 0 2d >> > > zookeeper-2-dndk7 1/1 Running 0 2d >> > > zookeeper-3-46n09 1/1 Running 0 2d >> > > >> > > >> > > Finally, accessing services within the Kubernetes cluster from the >> > outside >> > > is quite cumbersome unless one uses an external load balancer. This >> > makes >> > > it difficult to bootstrap a job, as SamzaJob must connect to Zookeeper >> > and >> > > Kafka to find out the number of partitions on the topics it will >> > subscribe >> > > to, so it can assign them statically among the number of containers >> > > requested. >> > > >> > > Ideally Samza would operate along the lines of the Kafka high-level >> > > consumer, which dynamically coordinate to allocate work among members >> of >> > a >> > > consumer group. This would do away with the new to execute SamzaJob a >> > > priori to generate the JobModel to pass to the SamzaContainers. It >> would >> > > also allow for dynamically changing the number of containers without >> > having >> > > the shutdown the job. >> > > >> > >>