Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5C48B200CDD for ; Mon, 7 Aug 2017 21:00:58 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 5AF88165C30; Mon, 7 Aug 2017 19:00:58 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9FF3D165C29 for ; Mon, 7 Aug 2017 21:00:57 +0200 (CEST) Received: (qmail 58988 invoked by uid 500); 7 Aug 2017 19:00:56 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 58978 invoked by uid 99); 7 Aug 2017 19:00:56 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Aug 2017 19:00:56 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id ED341C05D9 for ; Mon, 7 Aug 2017 19:00:55 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.202 X-Spam-Level: X-Spam-Status: No, score=-0.202 tagged_above=-999 required=6.31 tests=[RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_SORBS_SPAM=0.5, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id auaMN2-cL6Wz for ; Mon, 7 Aug 2017 19:00:51 +0000 (UTC) Received: from mout.gmx.net (mout.gmx.net [212.227.15.18]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id AED395F6C3 for ; Mon, 7 Aug 2017 19:00:50 +0000 (UTC) Received: from macpete.atpnet.local ([195.243.188.146]) by mail.gmx.com (mrgmx003 [212.227.17.190]) with ESMTPSA (Nemesis) id 0MMjgF-1ditsU2bAl-008XIp for ; Mon, 07 Aug 2017 21:00:49 +0200 From: Peter Ertl Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: load + update global state Message-Id: <027A2136-0DFF-4B0B-9B0B-5D74B8BFA707@gmx.net> Date: Mon, 7 Aug 2017 21:00:49 +0200 To: user@flink.apache.org X-Mailer: Apple Mail (2.3273) X-Provags-ID: V03:K0:xgTARGbwUL6mAPLpCMUrLxk7d5LTr1gePqrb4YT/huJd+Rr7I2I gWD6pIf9hLZcSVDlOR1Fs2ANgnWER90VucEjkY55nVOQ9l/z0/VHomkbLVbALqL3pd74m1t P6qpbVILOWjyDTYgqaCKPdZGs2M/vM+Tf6HJwZ+JsMSvQ+IRMx+SPRfNtnvk3bHLLpkfZrM kBUOkn0kLcH4lGetKGPZQ== X-UI-Out-Filterresults: notjunk:1;V01:K0:MosC0xhndFY=:UAke49mewuyAIccjoqgqNP 0+6uoF388YLNf9dMyssKJmg4o5DxQYCWGfubFEf+k9sgQrVv7Kq+Ie7+FVIqr/bBPFpJhUZaX eAUVCci/+PwPRzpoR9kP2/vo0fmV635xM6POHGLH0vyZm5dPvrmg9pwhHu2LMjPOISguCEUVc ihocMt/3vZFXaCs9ZJa1M73Od+CsPtW5MvqZqslT2Tw5hoTk7xEOAzgZydAdRXOEidAQ2yHl/ Nedjil0RGYlKr2+w8lAOKm/9DIBOWG8cnDiMdoi7pRq/YBozeCwOLrLFkt696WH6QxZF5dR9x 8oRXmdIro0t0lecOMfOx9ucppa/b9QmqPKUpuwzLl8JVYdYavbg+0+Fl1ZNGvO83Eq4sLw8We t+6Vr4LbTPhUn9wqpHJEZqbZbVg2a7dSiEwYQhG45OJSHGJO4kTB8AAaYUo1T3J6//m/ELIFS ELK1+Fk60ipuFJt9/ftzy5mUitTk9jmGa/r8gAX8Eq1HcacnvuM195YT04K3Pw3T4ogiOIdlA yikKnDzQGRqXflFPFPT8gxJwK5ZDmO7dHdVeJDxLOE0aKrh9oTxNn8TCikcGJU1CAL1J+0TGo IvA1/KS6VclxdS/BuIjopANRNXcr06Y7HAfsGHBhMamtiMBY1KT/kINVO9zfMCL0BshNWNwwK zRLvmEpc9ELjNc0xw9y+1QXXEbL7CzVN5xUylew55iP8gB5gnIfIipGOC2Qn2Cnb1idgcwqUz LAAHEuVv6wJfQhSa6BcD2NN+TlsMvLHV/cOCBvwdQ6ehCqp5m6mg7Z1q6Lo5mXTxuNU6ETO7N sliAbPJgDNZrNvIOPqQcoY8sKu2jFPncJyTHx60dYaa+pa5Qhs= archived-at: Mon, 07 Aug 2017 19:00:58 -0000 Hi folks, I am coding a streaming task that processes http requests from our web = site and enriches these with additional information. It contains session ids from historic requests and the related emails = that were used within these session in the past. lookup - hashtable: session_id: String =3D> emails: Set[String] During processing of these NEW http request - the lookup table should be used to get previous emails and enrich the = current stream item - new candidates for the lookup table will be discovered during = processing of these items and should be added to the lookup table (also = these changes should be visible through the cluster) I see at least the following issues: (1) load the state as a whole from the data store into memory is a huge = burn of memory (also making changes cluster-wide visible is an issue) (2) not loading into memory but using something like cassandra / redis = as a lookup store would certainly work but introduces a lot of network = requests (possible ideas: use a distributed cache? broadcast updates in = flink cluster?) (3) how should I integrate the changes to the table with flink's = checkpointing? I really don't get how to solve this best and my current solution is far = from elegant....=20 So is there any best practice for supporting "large lookup tables that = change during stream processing" ? Cheers Peter