Return-Path: X-Original-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E00F4100AB for ; Wed, 18 Sep 2013 19:14:34 +0000 (UTC) Received: (qmail 26297 invoked by uid 500); 18 Sep 2013 19:14:28 -0000 Delivered-To: apmail-hadoop-hdfs-dev-archive@hadoop.apache.org Received: (qmail 26191 invoked by uid 500); 18 Sep 2013 19:14:24 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 26174 invoked by uid 99); 18 Sep 2013 19:14:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Sep 2013 19:14:21 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of andrew.wang@cloudera.com designates 209.85.212.49 as permitted sender) Received: from [209.85.212.49] (HELO mail-vb0-f49.google.com) (209.85.212.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Sep 2013 19:14:17 +0000 Received: by mail-vb0-f49.google.com with SMTP id w16so5514848vbb.36 for ; Wed, 18 Sep 2013 12:13:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=VlxhKYkNF3meqm1jrY2aGvfqVW+2daAKo2z7cQuCLV4=; b=J/mpgaDu3hmoYvXIuWSpWih4BRVxAY5p+MD8rTGdSXKqQkY1kppG4e/nG69sZp4u0Q sJif6iLCQphVsFITm9ejBg+VUrcRAPgIrHueuxY8zSDsGDC0yJ8kzOuJFr1EREXHbdeq CGD7mRmPPwMorRiwM0vLoCMvWyJywOapLsBaBeKp7q87a6fNZeNAK6xZo1PPbLtkyVsx r5YlZhPhumvN+Ptd7nCYotQjyhkvKNjg/PybO1ByUV9Rh/smYYLDeGZhgy6mOrLAzOi0 sjMnLGq0Smy54INEw+qoo4u4ga4nTvi+jir6Vlzj73Gu9nJLtSadGtJH2LiBWfUFwjqX +dSg== X-Gm-Message-State: ALoCoQk5DJGkBQ9YBZXsseuowulnnAH6k3OW4p2Ic6J1S1xKAherBg1pBp4YgmrEjt83b38KkR55 X-Received: by 10.220.174.200 with SMTP id u8mr39322598vcz.6.1379531636450; Wed, 18 Sep 2013 12:13:56 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.35.10 with HTTP; Wed, 18 Sep 2013 12:13:35 -0700 (PDT) In-Reply-To: References: From: Andrew Wang Date: Wed, 18 Sep 2013 12:13:35 -0700 Message-ID: Subject: Re: symlink support in Hadoop 2 GA To: "hdfs-dev@hadoop.apache.org" Content-Type: multipart/alternative; boundary=089e0149ca3a25bd2a04e6ad3d8d X-Virus-Checked: Checked by ClamAV on apache.org --089e0149ca3a25bd2a04e6ad3d8d Content-Type: text/plain; charset=ISO-8859-1 It's an incompatible change. Existing APIs like listStatus and globStatus need to be symlink aware now, which can break assumptions of user code. We've had FileStatus#isSymlink() since the early days, but lots of user code hasn't been updated to use it. I think Eli's earlier email did a good job at laying out the current state and our options. I didn't realize this before, but most of HADOOP-8040 is already in branch-2.1-beta, but many of the subsequent changes are not (e.g. HADOOP-9417, HADOOP-9817, HADOOP-9652). This means the current state of symlink support in branch-2.1-beta is half-baked, which is why "do nothing" is not a good option. With that in mind, perhaps Eli's proposals (abbreviated here) make more sense: 1) Delay 2.2 GA and put in some more effort to fix API issues like HADOOP-9912 / HADOOP-9972. Undoubtedly, more issues will still fall out of this post-GA, but we can do our best to fix these issues compatibly in 2.3. 2) Revert symlinks from branch-2.1-beta and leave it all for 2.3, but that makes 2.3 a pretty big jump from GA. Since symlinks have already appeared in the 2.1.0 release, it'd also technically make 2.2 a regression from 2.1.0. 3) Wait for 3.0, which I don't think anyone wants. On Wed, Sep 18, 2013 at 10:05 AM, Steve Loughran wrote: > the main change is whatever APIs are going to be provided (and implicitly: > supported for a long time) to handle symlinks separately from directories > > > On 18 September 2013 17:24, Eli Collins wrote: > > > On Wed, Sep 18, 2013 at 5:45 AM, Steve Loughran > >wrote: > > > > > On 18 September 2013 12:53, Alejandro Abdelnur > > wrote: > > > > > > > On Wed, Sep 18, 2013 at 11:29 AM, Steve Loughran < > > stevel@hortonworks.com > > > > >wrote: > > > > > > > > > I'm reluctant for this as while delaying the release, because we > are > > > > going > > > > > to find problems all the way up the stack -which will require a > > > > > choreographed set of changes. Given the grief of the protbuf > update, > > I > > > > > don't want to go near that just before the final release. > > > > > > > > > > > > > Well, I would use the exact same argument used for protobuf (which > only > > > > complication was getting protoc 2.5.0 in the jenkins boxes and > > > communicate > > > > developers to do the same, other than that we didn't hit any other > > issue > > > > AFAIK) ... > > > > > > > > > > protobuf was traumatic at build time, as I recall because it was > neither > > > forwards or backwards compatible. Those of us trying to build different > > > branches had to choose which version to have on the path, or set up > > scripts > > > to do the switching. HBase needed rebuilding, so did other things. And > I > > > still have the pain of downloading and installing protoc on all Linux > > VMs I > > > build up going forward, until apt-get and yum have protoc 2.5 > artifacts. > > > > > > This means it was very painful for developer, added a lot of late > > breaking > > > pain to the developers, but it had one key feature that gave it an > edge: > > it > > > was immediately obvious where you had a problem as things didn't > compile > > or > > > classload without linkage problems. No latent bugs, unless protobuf 2.5 > > has > > > them internally -for which we have to rely on google's release testing > to > > > have found. > > > > > > That is a lot simpler to regression test than adding any new feature to > > > HDFS and seeing what breaks -as that is something that only surfaces > out > > in > > > the field. Which is why I think it's too late in the 2.1 release > > timetable > > > to add symlinks. We've had a 2.1-beta out there, we've got feedback. > Fix > > > those problems that are show stoppers, but don't add more stuff. Which > is > > > precisely why I have not been pushing in any of my recent changes. I > may > > > seem ruthless arguing against symlinks -but I'm not being inconsistent > > with > > > my own commit history. The only two things I've put in branch-2.1 since > > > beta-1 were a separate log for the Configuration deprecation warnings > > and a > > > patch to the POM for a java7 build on OSX: and they weren't even my > > > patches. > > > > > > > > > -Steve > > > > > > (One of these days I should volunteer to be the release manager and > it'll > > > be obvious that Arun is being quite amenable to all the other > developers) > > > > > > > > > > > > > > > > > IMO, it makes more sense to do this change during the beta rather > than > > > when > > > > GA. That gives us more flexibility to iron out things if necessary. > > > > > > > > > > > I'm arguing this change can go into the beta of the successor to 2.1 > -not > > > GA. > > > > > > > > What does "this change" refer to? Symlinks are already in 2.1, and the > > existing semantics create problems for programs (eg see the pig > > example in HADOOP-9912) > > that we need to resolve. I don't think do nothing is an option for 2.2. > > GA. > > > > Thanks, > > Eli > > > > > > > > > > > > > > > > > -- > > > CONFIDENTIALITY NOTICE > > > NOTICE: This message is intended for the use of the individual or > entity > > to > > > which it is addressed and may contain information that is confidential, > > > privileged and exempt from disclosure under applicable law. If the > reader > > > of this message is not the intended recipient, you are hereby notified > > that > > > any printing, copying, dissemination, distribution, disclosure or > > > forwarding of this communication is strictly prohibited. If you have > > > received this communication in error, please contact the sender > > immediately > > > and delete it from your system. Thank You. > > > > > > > > > -- > Steve Loughran > Hortonworks Inc > stevel@hortonworks.com > skype: steve_loughran > tel: +1 408 400 3721 > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. > --089e0149ca3a25bd2a04e6ad3d8d--