Return-Path: X-Original-To: apmail-apex-dev-archive@minotaur.apache.org Delivered-To: apmail-apex-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A90F91881B for ; Tue, 3 Nov 2015 23:56:02 +0000 (UTC) Received: (qmail 13384 invoked by uid 500); 3 Nov 2015 23:55:57 -0000 Delivered-To: apmail-apex-dev-archive@apex.apache.org Received: (qmail 13324 invoked by uid 500); 3 Nov 2015 23:55:57 -0000 Mailing-List: contact dev-help@apex.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@apex.incubator.apache.org Delivered-To: mailing list dev@apex.incubator.apache.org Received: (qmail 13310 invoked by uid 99); 3 Nov 2015 23:55:57 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Nov 2015 23:55:57 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id CB0DCC3CCA for ; Tue, 3 Nov 2015 23:55:56 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.001 X-Spam-Level: *** X-Spam-Status: No, score=3.001 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=3, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=datatorrent_com.20150623.gappssmtp.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id Dpz9mm65s5QZ for ; Tue, 3 Nov 2015 23:55:47 +0000 (UTC) Received: from mail-wm0-f53.google.com (mail-wm0-f53.google.com [74.125.82.53]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 12EE420B91 for ; Tue, 3 Nov 2015 23:55:47 +0000 (UTC) Received: by wmll128 with SMTP id l128so100818098wml.0 for ; Tue, 03 Nov 2015 15:55:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=datatorrent_com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=C5wg4ndSDc7XGVAjV8/gn0PgKXWijN9RizZw3PDOcMk=; b=jc6/J2/yX1ePhNB6u0kGkDQ0/aCmRe+wP2oLGolHKfL/UUkYfuvwdNYEeLrVv1UtEY r8VmFjar/VV5aeYj2dIj0dIQJ/u/PsY/4uOD7odjS0mLKAhQ4fJovHV79Kw3QQZ2CL12 DnFSNNC5o+xuEUAa7ciEOPgHKN+vuF6TJj8ZFJdrNHj8tffIND5XcPcbbjDyB/4DmfP/ L+aYlGqvYrdU2shcR3wQ1vmHCZJmZyDQJvyb4FEEwOmB6CHFA0rXXhIlOx4VYr7twKIj V8ZG5+d7K+LgJUsqlCr1Ui6poGKZgdQmoFxsTS37/mc87G+x6QZqLuz/MTI92PA/mb8x Ewcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=C5wg4ndSDc7XGVAjV8/gn0PgKXWijN9RizZw3PDOcMk=; b=DcyWN3GRbsYtPeyFfFUV14sKjwtPB7YKi5IHaZfrbh0Swo2nhTSnFII6MDyL9ou8np F62X50BjHB/Kw+VgSziI0trB59cZalO3LugHgSf5tFTB8HCO99BGRaceu8kYYKPVNoWs uwkrZDbcPpZBP2QVRRfKORAHBJh9pmchQonEMUrVjq+XznS3cTmzQq2d4J5jzsRYuY/2 QnVVbR3lfj0Z/58kT/a98fpbOg3k4rMa1VVFv6YdCta65Ire0mGoeFwKTvwJoZ6ymI0V DZQKiGTDxDvdy2dogPZc/KNzYzq9XDVkx3KG9ohpPgP+MT/HmzKtJJf46U/g2ryb2Iut 2nRQ== X-Gm-Message-State: ALoCoQmbj5qXRoLX23xRbkKvx5y20CeXdo6AvefCZlFCE5A5IWFzBuaSI6dQJuVLhrnwkh4cM0Hd MIME-Version: 1.0 X-Received: by 10.28.132.69 with SMTP id g66mr21209287wmd.69.1446594940688; Tue, 03 Nov 2015 15:55:40 -0800 (PST) Received: by 10.28.220.70 with HTTP; Tue, 3 Nov 2015 15:55:40 -0800 (PST) In-Reply-To: References: Date: Tue, 3 Nov 2015 15:55:40 -0800 Message-ID: Subject: Re: AbstractFileOutputOperator to be used with ftp and s3 file System From: Chandni Singh To: dev@apex.incubator.apache.org Content-Type: multipart/alternative; boundary=001a1144309c92f0d70523aba1ea --001a1144309c92f0d70523aba1ea Content-Type: text/plain; charset=UTF-8 Here is an abstract implementation that can work with filesystems that don't support append https://github.com/chandnisingh/Malhar/blob/examples/library/src/main/java/com/datatorrent/lib/io/fs/AbstractNonAppendFileOutputOperator.java On Tue, Nov 3, 2015 at 9:45 AM, Chandni Singh wrote: > Will do. > > On Tue, Nov 3, 2015 at 9:41 AM, Thomas Weise > wrote: > >> Agreed, there will be be applications that write to many files that cannot >> be all remain open forever. >> >> Can you provide an example on how to modify the append behavior depending >> on HFS implementation? >> >> https://malhar.atlassian.net/browse/MLHR-1888 >> >> >> On Tue, Nov 3, 2015 at 9:35 AM, Chandni Singh >> wrote: >> >> > Hi, >> > >> > Please look at the latest changes to this operator. >> > These changes enable overriding stream opening and closing. >> Implementation >> > can control how they want to achieve append() if at all. >> > >> > This operator from its conception is based on a cache of open streams >> which >> > has a maximum size which that if at any point of time that limit is >> near, >> > the cache will evict entries (close streams). Another setting is expiry >> > time which evicts and closes a stream when it hasn't been accessed for a >> > while in the cache. >> > >> > If the user wants to actually never close the stream they can initialize >> > both these values to their respective max values. But in an real case >> > scenario the user needs to know that when a file will be eventually >> closed >> > (never written to) and using that information they can configure these >> > settings or again initialize them to their max and close the streams >> > explicitly. >> > >> > Let's say if we don't have this cache and we are writing to multiple >> files. >> > Then that implies that multiple streams will always hang around in >> memory >> > (even if they weren't accessed) all the time. This in my opinion is a >> > problematic design which will cause bigger issues like out of memory all >> > the time. >> > >> > Chandni >> > >> > >> > On Tue, Nov 3, 2015 at 7:58 AM, Thomas Weise >> > wrote: >> > >> > > Append is used to continue writing to files that were closed and left >> in >> > a >> > > consistent state before. When append is not available, then we would >> need >> > > to disable the optimization to close and reopen files? >> > > >> > > >> > > On Tue, Nov 3, 2015 at 6:14 AM, Munagala Ramanath < >> ram@datatorrent.com> >> > > wrote: >> > > >> > > > Shouldn't "append" be a user-configurable property which, if false, >> > > causes >> > > > the >> > > > file to be overwritten ? >> > > > >> > > > Ram >> > > > >> > > > On Mon, Nov 2, 2015 at 10:51 PM, Priyanka Gugale >> > > > wrote: >> > > > > Hi, >> > > > > >> > > > > AbstractFileOutputOperator is used to write output files. The >> > operator >> > > > has >> > > > > a method "getFSInstance". This initializes file system. One can >> > > override >> > > > > the method to initialize desired file system which extends hadoop >> > > > > FileSystem. In our implementation we have overridden >> "getFSInstance" >> > to >> > > > > initialize FTPFileSystem. >> > > > > >> > > > > The file loader code in setup method of AbstractFileOutputOperator >> > > opens >> > > > > the file in append mode when file is already present. The issue is >> > > > > FTPFileSystem doesn't support append function. >> > > > > >> > > > > The solution to problem could be: >> > > > > 1. Override append method in FTPFileSystem. >> > > > > -This would be tricky as file system doesn't support the >> > operation. >> > > > And >> > > > > there are other file systems as well like S3 which also don't >> support >> > > > > append. >> > > > > 2. Avoid using functions like "append" which are not supported by >> > some >> > > of >> > > > > the implementations of Hadoop FileSystem. >> > > > > 3. Write file loading logic (which is in setup method) in >> functions >> > > which >> > > > > can be extended by subclass to override the logic to load files >> (by >> > > > > avoiding using calls like append which are not supported by user's >> > > chosen >> > > > > file system). >> > > > > >> > > > > -Priyanka >> > > > >> > > >> > >> > > --001a1144309c92f0d70523aba1ea--