From issues-return-148419-archive-asf-public=cust-asf.ponee.io@flink.apache.org Thu Jan 18 17:06:06 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 312D0180654 for ; Thu, 18 Jan 2018 17:06:06 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 20ED5160C36; Thu, 18 Jan 2018 16:06:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 41149160C2B for ; Thu, 18 Jan 2018 17:06:05 +0100 (CET) Received: (qmail 33244 invoked by uid 500); 18 Jan 2018 16:06:04 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 33228 invoked by uid 99); 18 Jan 2018 16:06:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Jan 2018 16:06:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id E3A021805D6 for ; Thu, 18 Jan 2018 16:06:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -108.71 X-Spam-Level: X-Spam-Status: No, score=-108.71 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id hf09HBVjqazR for ; Thu, 18 Jan 2018 16:06:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 67D385F343 for ; Thu, 18 Jan 2018 16:06:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 6FAFDE0EEF for ; Thu, 18 Jan 2018 16:06:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2A4FE21304 for ; Thu, 18 Jan 2018 16:06:00 +0000 (UTC) Date: Thu, 18 Jan 2018 16:06:00 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (FLINK-8453) Add SerializableExecutionGraphStore to Dispatcher MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Auto-Submitted: auto-generated [ https://issues.apache.org/jira/browse/FLINK-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16330684#comment-16330684 ] ASF GitHub Bot commented on FLINK-8453: --------------------------------------- GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/5310 [FLINK-8453] [flip6] Add SerializableExecutionGraphStore to Dispatcher ## What is the purpose of the change The SerializableExecutionGraphStore is responsible for storing completed jobs for historic job requests (e.g. from the web ui or from the client). The store is populated by the Dispatcher once a job has terminated. The FileSerializableExecutionGraphStore implementation persists all SerializableExecutionGraphs on disk in order to avoid OOM problems. It only keeps some of the stored graphs in memory until it reaches a configurable size. Once coming close to this size, it will evict the elements and only reload them if requested again. Additionally, the FileSerializableExecutionGraphStore defines an expiration time after which the execution graphs will be removed from disk. This prevents excessive use of disk resources. This PR is based on #5309. ## Brief change log - Introduce `SerializableExecutionGraphStore` and `FileSerializableExecutionGraphStore` - Add `FileSerializableExecutionGraphStore` to `Dispatcher` - Store `SerializableExecutionGraphs` in corresponding `FileSerializableExecutionGraphStore` - Adapt `Dispatcher` to serve requests for historic jobs ## Verifying this change - Added `FileSerializableExecutionGraphStoreTest` ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable) cc @GJL You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink addHistoricJobView Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5310.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5310 ---- commit a959b9411833e320065b328ed2fc936b58f911f4 Author: Till Rohrmann Date: 2018-01-16T17:45:53Z [FLINK-8449] [flip6] Extend OnCompletionActions to accept an SerializableExecutionGraph This commit introduces the SerializableExecutionGraph which extends the AccessExecutionGraph and adds serializability to it. Moreover, this commit changes the OnCompletionActions interface such that it accepts a SerializableExecutionGraph instead of a plain JobResult. This allows to archive the completed ExecutionGraph for further usage in the container component of the JobMasterRunner. commit ca15b076c05ff940a12a240ba385e2434f93790b Author: Till Rohrmann Date: 2018-01-18T14:02:36Z [hotfix] [tests] Let BucketingSink extend TestLogger commit 21c25502fb6d07c6fb65f18100dc6d4ec23e9d93 Author: Till Rohrmann Date: 2018-01-17T14:01:57Z [FLINK-8450] [flip6] Make JobMaster/DispatcherGateway#requestJob type safe Let JobMasterGateway#requestJob and DispatcherGateway#requestJob return a CompletableFuture instead of a CompletableFuture. In order to support the old code and the JobManagerGateway implementation we have to keep the return type in RestfulGateway. Once the old code has been removed, we should change this as well. commit 7b7b0692582189b8e540e5ae022d351c45991e43 Author: Till Rohrmann Date: 2018-01-17T11:22:43Z [FLINK-8453] [flip6] Add SerializableExecutionGraphStore to Dispatcher The SerializableExecutionGraphStore is responsible for storing completed jobs for historic job requests (e.g. from the web ui or from the client). The store is populated by the Dispatcher once a job has terminated. The FileSerializableExecutionGraphStore implementation persists all SerializableExecutionGraphs on disk in order to avoid OOM problems. It only keeps some of the stored graphs in memory until it reaches a configurable size. Once coming close to this size, it will evict the elements and only reload them if requested again. Additionally, the FileSerializableExecutionGraphStore defines an expiration time after which the execution graphs will be removed from disk. This prevents excessive use of disk resources. ---- > Add SerializableExecutionGraphStore to Dispatcher > ------------------------------------------------- > > Key: FLINK-8453 > URL: https://issues.apache.org/jira/browse/FLINK-8453 > Project: Flink > Issue Type: Improvement > Components: Distributed Coordination, REST > Affects Versions: 1.5.0 > Reporter: Till Rohrmann > Assignee: Till Rohrmann > Priority: Major > Labels: flip-6 > Fix For: 1.5.0 > > > The {{Dispatcher}} should have a {{SerializableExecutionGraphStore}} which it can use to store completed jobs. This store can then be used to serve historic job requests from the web UI, for example. The default implementation should persist the jobs to disk and evict the in memory instances once they grow to big in order to avoid memory leaks. Additionally, the store should expire elements from disk after a user defined time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)