Return-Path: X-Original-To: apmail-apex-dev-archive@minotaur.apache.org Delivered-To: apmail-apex-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CBD1618CCB for ; Tue, 3 Nov 2015 17:35:50 +0000 (UTC) Received: (qmail 13892 invoked by uid 500); 3 Nov 2015 17:35:50 -0000 Delivered-To: apmail-apex-dev-archive@apex.apache.org Received: (qmail 13829 invoked by uid 500); 3 Nov 2015 17:35:50 -0000 Mailing-List: contact dev-help@apex.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@apex.incubator.apache.org Delivered-To: mailing list dev@apex.incubator.apache.org Received: (qmail 13817 invoked by uid 99); 3 Nov 2015 17:35:50 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Nov 2015 17:35:50 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id CD4451A2F83 for ; Tue, 3 Nov 2015 17:35:49 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.98 X-Spam-Level: ** X-Spam-Status: No, score=2.98 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=datatorrent_com.20150623.gappssmtp.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 6loyykZI7EiL for ; Tue, 3 Nov 2015 17:35:44 +0000 (UTC) Received: from mail-wm0-f51.google.com (mail-wm0-f51.google.com [74.125.82.51]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 599CD23850 for ; Tue, 3 Nov 2015 17:35:44 +0000 (UTC) Received: by wmeg8 with SMTP id g8so21167983wme.1 for ; Tue, 03 Nov 2015 09:35:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=datatorrent_com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=4owbQtEvm8s+IN495cNDJIN3+r5NtjSeMcHax94JGgI=; b=Fu1vLyJPqD8svG10yj26bgKdspVX9ktH5Zye0n+9Qf55cC7JsfCAcXAvB+a4MfkT3W NxaUUWRA+lEFhgoLhmhBJ0VdjLjN0y6VF73JBdq+GwBWwQCiiBoO9jfKmEzRHKDLfKN7 OEcwKYMOJBcVkahz+dpvIEo6Dk1aF68XuwSI9tws3m6rxPJJAdtrmFAQz0Ct9NIekYYV 8vbmpgHNtxcXYBw20IbPaMGg/YCp+Z7KDWp269RoJwdgvAGtoj/dTC1CH/tOW8z/qnzQ kDflAdEgOE8WWi+7gNSJpJyl0wL6IMz+eK14XR0WNFmethgxCpl8A644Dn1Khbh1Uymn c9Gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=4owbQtEvm8s+IN495cNDJIN3+r5NtjSeMcHax94JGgI=; b=apZeqVhhbs99gZcZ8jf887uNTfSpFMDCLc2fb+kO3sqWeyi+MaWrXLN6mpvAGxQQsY A3YqA0juxHqhKfxj1qSnPNcWhYD4z8GF1eqyP1JaTWk7HOMMpSelNjP9nzO34TwNwtuz 3UkReZ6mLzJCbr9YOHahzQrEFQbEYZvI8O09sknRxQ3/cG5nxib6SWddxcvITmbSs7Ly +hE8iegBaKUIJLpzfWGNIM6nbn5lK0UYEbDu4zhpuNpQ0DPpSiCKOecAZ1+BpsKMxEty tlLHF3f3f/P2V4QyJkckW2xK++E6zSuspm7MsLS7dOtLCZJkgl9nb3tDl6Fn3p2dr6vp JK1Q== X-Gm-Message-State: ALoCoQkO6Juk13nk8aWioiIMMoPuOlLIo2o6BINkO5aGBr5HNtZFkVJLokfLaGFkKWS9zEbhLk1D MIME-Version: 1.0 X-Received: by 10.28.55.144 with SMTP id e138mr19854648wma.69.1446572142942; Tue, 03 Nov 2015 09:35:42 -0800 (PST) Received: by 10.28.220.70 with HTTP; Tue, 3 Nov 2015 09:35:42 -0800 (PST) In-Reply-To: References: Date: Tue, 3 Nov 2015 09:35:42 -0800 Message-ID: Subject: Re: AbstractFileOutputOperator to be used with ftp and s3 file System From: Chandni Singh To: dev@apex.incubator.apache.org Content-Type: multipart/alternative; boundary=001a1143c552b97a110523a652c8 --001a1143c552b97a110523a652c8 Content-Type: text/plain; charset=UTF-8 Hi, Please look at the latest changes to this operator. These changes enable overriding stream opening and closing. Implementation can control how they want to achieve append() if at all. This operator from its conception is based on a cache of open streams which has a maximum size which that if at any point of time that limit is near, the cache will evict entries (close streams). Another setting is expiry time which evicts and closes a stream when it hasn't been accessed for a while in the cache. If the user wants to actually never close the stream they can initialize both these values to their respective max values. But in an real case scenario the user needs to know that when a file will be eventually closed (never written to) and using that information they can configure these settings or again initialize them to their max and close the streams explicitly. Let's say if we don't have this cache and we are writing to multiple files. Then that implies that multiple streams will always hang around in memory (even if they weren't accessed) all the time. This in my opinion is a problematic design which will cause bigger issues like out of memory all the time. Chandni On Tue, Nov 3, 2015 at 7:58 AM, Thomas Weise wrote: > Append is used to continue writing to files that were closed and left in a > consistent state before. When append is not available, then we would need > to disable the optimization to close and reopen files? > > > On Tue, Nov 3, 2015 at 6:14 AM, Munagala Ramanath > wrote: > > > Shouldn't "append" be a user-configurable property which, if false, > causes > > the > > file to be overwritten ? > > > > Ram > > > > On Mon, Nov 2, 2015 at 10:51 PM, Priyanka Gugale > > wrote: > > > Hi, > > > > > > AbstractFileOutputOperator is used to write output files. The operator > > has > > > a method "getFSInstance". This initializes file system. One can > override > > > the method to initialize desired file system which extends hadoop > > > FileSystem. In our implementation we have overridden "getFSInstance" to > > > initialize FTPFileSystem. > > > > > > The file loader code in setup method of AbstractFileOutputOperator > opens > > > the file in append mode when file is already present. The issue is > > > FTPFileSystem doesn't support append function. > > > > > > The solution to problem could be: > > > 1. Override append method in FTPFileSystem. > > > -This would be tricky as file system doesn't support the operation. > > And > > > there are other file systems as well like S3 which also don't support > > > append. > > > 2. Avoid using functions like "append" which are not supported by some > of > > > the implementations of Hadoop FileSystem. > > > 3. Write file loading logic (which is in setup method) in functions > which > > > can be extended by subclass to override the logic to load files (by > > > avoiding using calls like append which are not supported by user's > chosen > > > file system). > > > > > > -Priyanka > > > --001a1143c552b97a110523a652c8--