Return-Path: X-Original-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AA9349270 for ; Mon, 26 Mar 2012 19:54:26 +0000 (UTC) Received: (qmail 39939 invoked by uid 500); 26 Mar 2012 19:54:26 -0000 Delivered-To: apmail-hadoop-hdfs-dev-archive@hadoop.apache.org Received: (qmail 39853 invoked by uid 500); 26 Mar 2012 19:54:26 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 39845 invoked by uid 99); 26 Mar 2012 19:54:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Mar 2012 19:54:26 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of rarecactus@gmail.com designates 209.85.212.176 as permitted sender) Received: from [209.85.212.176] (HELO mail-wi0-f176.google.com) (209.85.212.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Mar 2012 19:54:20 +0000 Received: by wibhm17 with SMTP id hm17so4410396wib.11 for ; Mon, 26 Mar 2012 12:53:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; bh=8XSuX25iV3TPW/brp6ZG532QDUb3p07Bpf7RGAvhIek=; b=0v0i3gth9Rs+3evl2TEWTJZ7EkN7XT7SYhKB+1hrTaQpZkHI+dS3Vop8Q39UhV2XB3 sq2kbgQtvz4JqmuDh81U4JY+hNuHCoA459ppHdhCRiS+2793QvXRflmEikvzgBauPz8s 7E33DusTwC0vdpS/vXTqlhxIwQWaVaslur1hw53ZZD+5u1Gt70/aQxPD45/+c4I/QXk7 baF2WmbWZofDADPqIIZoni0IzWRFwNuZYjryN32GC1K5cdfgO0wMzd2d8BHs8/3Ass9L XAPVu0FC2JvJE8Xu6Q5U3cAxF6Gc8CRYb1/hJrBjCTL+oMKRDvAdBsrjx36uGGJHFC4U Y2oQ== MIME-Version: 1.0 Received: by 10.180.107.132 with SMTP id hc4mr21094075wib.21.1332791639816; Mon, 26 Mar 2012 12:53:59 -0700 (PDT) Sender: rarecactus@gmail.com Received: by 10.223.106.193 with HTTP; Mon, 26 Mar 2012 12:53:59 -0700 (PDT) In-Reply-To: References: Date: Mon, 26 Mar 2012 12:53:59 -0700 X-Google-Sender-Auth: Cuo2xznd8vFLUp2nN4JklD70S7U Message-ID: Subject: Re: [DISCUSS] Remove append? From: Colin McCabe To: hdfs-dev@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Fri, Mar 23, 2012 at 7:44 PM, Scott Carey wrot= e: > > > On 3/22/12 10:25 AM, "Eli Collins" wrote: > >>On Thu, Mar 22, 2012 at 1:26 AM, Konstantin Shvachko >> wrote: >>> Eli, >>> >>> I went over the entire discussion on the topic, and did not get it. Is >>> there a problem with append? We know it does not work in hadoop-1, >>> only flush() does. Is there anything wrong with the new append >>> (HDFS-265)? If so please file a bug. >>> I tested it in Hadoop-0.22 branch it works fine. >>> >>> I agree with people who were involved with the implementation of the >>> new append that the complexity is mainly in >>> 1. pipeline recovery >>> 2. consistent client reading while writing, and >>> 3. hflush() >>> Once it is done the append itself, which is reopening of previously >>> closed files for adding data, is not complex. >>> >> >>I agree that much of the complexity is in #1-3 above, which is why >>HDFS-265 is leveraged. >>The primary simplicity of not having append (and truncate) comes from >>not leveraging the invariant that finalized blocks are immutable, that >>blocks once written won't eg shrink in size (which we assume today). > > That invariant can co-exist with append via copy-on-write. =A0The new sta= te > and old state would co-exist until the old state was not needed, a file's > block map would have to use a persistent data structure. Copy on write > semantics with blocks in file systems is all the rage these days. =A0Free > snapshots, atomic transactions for operations on multiple blocks, etc. Hi Scott, If a client accesses a file, and then the client becomes unresponsive, how long should you wait before declaring the blocks he was looking at unused? No matter how long or how short a period you choose, someone will argue with it. And having to track this kind of state in the NameNode introduces a huge amount of complexity, not to mention extra memory consumption. Basically, we would have to track the ID of every block that any client looked at, at all times. Colin > >> >>> You mentioned it and I agree you indeed should be more involved with >>> your customer base. As for eBay, append was of the motivations to work >>> on stabilizing 0.22 branch. And there is a lot of use cases which >>> require append for our customers. >>> Some of them were mentioned in this discussion. >>> >> > >From what I've seen 0.22 isn't ready for production use. Aside from >>not supporting critical features like security, it doesn't have a >>size-able user-base behind it testing and fixing bugs, etc. All things >>I'd imagine an org like eBay would want. =A0I've never gotten a request >>to support 0.22 from a customer. >> >>Thanks, >>Eli >