Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5EA4D200B9C for ; Mon, 26 Sep 2016 07:18:53 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 5D273160AE2; Mon, 26 Sep 2016 05:18:53 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A2411160ACE for ; Mon, 26 Sep 2016 07:18:52 +0200 (CEST) Received: (qmail 8715 invoked by uid 500); 26 Sep 2016 05:18:50 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 8704 invoked by uid 99); 26 Sep 2016 05:18:50 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Sep 2016 05:18:50 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 676951804C1 for ; Mon, 26 Sep 2016 05:18:50 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.285 X-Spam-Level: ** X-Spam-Status: No, score=2.285 tagged_above=-999 required=6.31 tests=[RCVD_IN_DNSWL_NONE=-0.0001, SPF_SOFTFAIL=0.972, URI_HEX=1.313] autolearn=disabled Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id aPwvqHd292gU for ; Mon, 26 Sep 2016 05:18:48 +0000 (UTC) Received: from mwork.nabble.com (mwork.nabble.com [162.253.133.43]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTP id DD5CE5F1E3 for ; Mon, 26 Sep 2016 05:18:47 +0000 (UTC) Received: from mben.nabble.com (unknown [162.253.133.72]) by mwork.nabble.com (Postfix) with ESMTP id E787851BBAF8F for ; Sun, 25 Sep 2016 22:18:46 -0700 (MST) Date: Sun, 25 Sep 2016 22:18:46 -0700 (MST) From: backtrack5 To: user@spark.apache.org Message-ID: <1474867126940-27792.post@n3.nabble.com> In-Reply-To: <1474656968668-27786.post@n3.nabble.com> References: <1474471173266-27770.post@n3.nabble.com> <1474656968668-27786.post@n3.nabble.com> Subject: Re: spark stream based deduplication MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit archived-at: Mon, 26 Sep 2016 05:18:53 -0000 Thank you @markcitizen . What I want to achieve is , say for an example My historic rdd has (Hash1, recordid1) (Hash2,recordid2) And in the new steam I have the following, (Hash3, recordid3) (Hash1,recordid5) In this above scenario, 1) for recordid5,I should get recordid5 is duplicate of recordid1. 2) the new values (hash3,recordid3) should added in the historic rdd. And I have one another question to ask, If the problem crashes at any point, is it possible to recover that historic rdd ? Can i use state full stream. ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-stream-based-deduplication-tp27770p27792.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscribe@spark.apache.org