Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2B03E200BD8 for ; Wed, 7 Dec 2016 21:16:54 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 299CD160B0C; Wed, 7 Dec 2016 20:16:54 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 73430160AF9 for ; Wed, 7 Dec 2016 21:16:53 +0100 (CET) Received: (qmail 54040 invoked by uid 500); 7 Dec 2016 20:16:51 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 54030 invoked by uid 99); 7 Dec 2016 20:16:51 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Dec 2016 20:16:51 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id C08B01803A7 for ; Wed, 7 Dec 2016 20:16:50 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.02 X-Spam-Level: X-Spam-Status: No, score=-0.02 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=koeninger-org.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id mBGGrLNPziqP for ; Wed, 7 Dec 2016 20:16:49 +0000 (UTC) Received: from mail-oi0-f52.google.com (mail-oi0-f52.google.com [209.85.218.52]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 691E95F5F9 for ; Wed, 7 Dec 2016 20:16:49 +0000 (UTC) Received: by mail-oi0-f52.google.com with SMTP id v84so429144414oie.3 for ; Wed, 07 Dec 2016 12:16:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=koeninger-org.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=RazcQQTmkrP/NAk2NfUnUa3FOxryTit4zpxyN8VZaWc=; b=0E1wXpYzN9GJ5JB4l/N0i6qwDI2ZBWcj+SF0WNjcrn9VClL6/nWI35qWAW9pSHIG91 Mf4afdD652ci9l3hbG9YfGxwfZ95dbX9nAZoAIrJTR4UU4tPPPCmyDgM7wHD6LQ53W7B kv6uF0Ub0uDRIAi2/ApG889cL5BXeZCFi+tiQO+u7YamLgV4EAPpyjXJQPahGlSCy47f H2wEpN6IUDd8LtDvCC2FNpZ7/fJdHNepbIf8Ft5gQObeL9pQgkHdLkuLDIuxEwgHtDro fkUJGrE/vAc86CSjokBC+SRZKlYSotAOF28N18U4UJwzhSzwprzt+QDuz6oLw/lOvoH/ EqKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=RazcQQTmkrP/NAk2NfUnUa3FOxryTit4zpxyN8VZaWc=; b=UkKXyP+39EVTt5tlnAKc+EIVY0Qo0OztZzfCCDD4lvRWx5QTgLDtw0KUczO9ee/I+/ tU7xENwXl+WVnOUwdixckEvTtCrNqOZEYQhqllhKIjG4e5WTvVHDorqpb8JlqeRs6pRB Wi9nPQCFizQXBTsf8g12SrXU2BiZNg/oGAbHRwEkUn00Ka3owVe0ie//ijGOKVBNmsdV pqtuZuiOkSYMvJRpFxjxAGyIOzfZAq6It//z5UQ6WLvfm/uGqtD2i2JPD/1PqEJCScwM ettgL+WHJonkGaOLwhByeqQgzOGedIFA/ZbZUvpKuTAqRYYPSzn0CqJYJXbqsy9WdISc E4Fw== X-Gm-Message-State: AKaTC011r9Jd4ymACDyQ1BpNZnKROdXcK0ovwS5RlwZnKSX6TFbqvNcxRwRjF4u7TvYXzPvPQiHJSTeAU/01fA== X-Received: by 10.157.46.135 with SMTP id w7mr36529544ota.64.1481141808139; Wed, 07 Dec 2016 12:16:48 -0800 (PST) MIME-Version: 1.0 Received: by 10.182.233.198 with HTTP; Wed, 7 Dec 2016 12:16:47 -0800 (PST) In-Reply-To: References: From: Cody Koeninger Date: Wed, 7 Dec 2016 14:16:47 -0600 Message-ID: Subject: Re: Reprocessing failed jobs in Streaming job To: map reduced Cc: "user @spark" Content-Type: text/plain; charset=UTF-8 archived-at: Wed, 07 Dec 2016 20:16:54 -0000 Personally I think forcing the stream to fail (e.g. check offsets in downstream store and throw exception if they aren't as expected) is the safest thing to do. If you proceed after a failure, you need a place to reliably record the batches that failed for later processing. On Wed, Dec 7, 2016 at 1:46 PM, map reduced wrote: > Hi, > > I am trying to solve this problem - in my streaming flow, every day few jobs > fail due to some (say kafka cluster maintenance etc, mostly unavoidable) > reasons for few batches and resumes back to success. > I want to reprocess those failed jobs programmatically (assume I have a way > of getting start-end offsets for kafka topics for failed jobs). I was > thinking of these options: > 1) Somehow pause streaming job when it detects failing jobs - this seems not > possible. > 2) From driver - run additional processing to check every few minutes using > driver rest api (/api/v1/applications...) what jobs have failed and submit > batch jobs for those failed jobs > > 1 - doesn't seem to be possible, and I don't want to kill streaming context > just for few failing batches to stop the job for some time and resume after > few minutes. > 2 - seems like a viable option, but a little complicated, since even the > batch job can fail due to whatever reasons and I am back to tracking that > separately etc. > > Does anyone has faced this issue or have any suggestions? > > Thanks, > KP --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscribe@spark.apache.org