Return-Path: X-Original-To: apmail-flink-issues-archive@minotaur.apache.org Delivered-To: apmail-flink-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2EA0810C0C for ; Thu, 3 Sep 2015 09:44:30 +0000 (UTC) Received: (qmail 47299 invoked by uid 500); 3 Sep 2015 09:44:30 -0000 Delivered-To: apmail-flink-issues-archive@flink.apache.org Received: (qmail 47255 invoked by uid 500); 3 Sep 2015 09:44:30 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 47246 invoked by uid 99); 3 Sep 2015 09:44:30 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Sep 2015 09:44:30 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id A709EF1275 for ; Thu, 3 Sep 2015 09:44:29 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.971 X-Spam-Level: X-Spam-Status: No, score=0.971 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id WVbpit9vOrn9 for ; Thu, 3 Sep 2015 09:44:25 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with SMTP id 0F81921232 for ; Thu, 3 Sep 2015 09:44:24 +0000 (UTC) Received: (qmail 47229 invoked by uid 99); 3 Sep 2015 09:44:24 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Sep 2015 09:44:24 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 8A599DFBD6; Thu, 3 Sep 2015 09:44:24 +0000 (UTC) From: StephanEwen To: issues@flink.incubator.apache.org Reply-To: issues@flink.incubator.apache.org References: In-Reply-To: Subject: [GitHub] flink pull request: [FLINK-2583] Add Stream Sink For Rolling HDFS ... Content-Type: text/plain Message-Id: <20150903094424.8A599DFBD6@git1-us-west.apache.org> Date: Thu, 3 Sep 2015 09:44:24 +0000 (UTC) Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1084#issuecomment-137393554 I think using truncate for exactly once is the way to go. To support users with older HDFS versions, how about this: 1. We consider only valid what was written successfully at a checkpoint (hflush/hsync). When we roll over to a new file on restart, we write a `.length` file for that other file that indicates how many bytes are valid in that file. Basically simulating truncate by adding a metadata file. 2. Optionally, the user can activate a merge-on roll-over, that takes all the files from the attempts and all the metadata files, and merges them into one file. This rollover can be written such that it works incrementally and re-tries on failures, etc... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---