Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4D2FB200B66 for ; Thu, 18 Aug 2016 19:37:51 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 4BC8C160AAE; Thu, 18 Aug 2016 17:37:51 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 42FFA160A86 for ; Thu, 18 Aug 2016 19:37:50 +0200 (CEST) Received: (qmail 24222 invoked by uid 500); 18 Aug 2016 17:37:47 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 24017 invoked by uid 99); 18 Aug 2016 17:37:47 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Aug 2016 17:37:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id A01CC1804CC; Thu, 18 Aug 2016 17:37:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id jXOLFNFshtN9; Thu, 18 Aug 2016 17:37:42 +0000 (UTC) Received: from mail-lf0-f54.google.com (mail-lf0-f54.google.com [209.85.215.54]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 219CD5FAF4; Thu, 18 Aug 2016 17:37:42 +0000 (UTC) Received: by mail-lf0-f54.google.com with SMTP id g62so16570202lfe.3; Thu, 18 Aug 2016 10:37:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=wveTRy2KOKy6Svw5pQ1qOA8F1MDGpfH/RUdDdJqe1QE=; b=MreXDTdhUtodBUddD/a7c7ZvR9BtctEQPB0PpApPLflN9qQTH4GcUhaTlifdvVpOAD Y4LBXgOoxttSqorKD9GWj2mclcZMqXcsmHqn4pQgYD24tFqbCWKP947WQwXTxuhm0pLJ uORX1qcqk5cun0koNKhh82onNmhy7iieWo5vOIF53/G24m+h+3QvE7t6oufmkbfky2o5 5+Cq9oFRbWcdiWvwX7DCuTNxhmQVnAkUjPMoor4iolzY+k/XYXGEW2n54yyB9fO1qHvx /uzjzgAIDUXFq52CiovJP3PEg0P5NTg3qyGgFEX76ghxSfqKA5/RYb9ytz901a++Wr1b WWlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=wveTRy2KOKy6Svw5pQ1qOA8F1MDGpfH/RUdDdJqe1QE=; b=OBOj+l3xii7Vea/PhigOQBrUV4qTfiCYGGrpfBt3LSuhPVmVVIP3prh6tMjpg+XeHw //DDa6qzvh9dJmZ9AYmFSjCW1m0TWd4WnAh1h6eyA5QgzhB/we+NmgUMPMavR1udfLUs YTzU3LzY/eh9G4/LO9B07u3iLOt91g9l6UBzZze1UPejwYxuyBFel0+67JwkjJdShdfn P0Pivz9EKF1ULderKuOSnpdWFxJrtFSyCiYec0wsKRBJJe98NfEsgv4ri0rnfqp+oDAu AgYKI/vd5JLU2kYr0xzxXwyVovmYtrdzbjxjQiBsFWyjPizQl0lDltRH9dYbuvEIUcsT OZ4w== X-Gm-Message-State: AEkoouvENvU5VnBJbZ9O+M2O3mCUKa1RMjJIY/nMTDK/tMaUpY6uuzAPTldRR1WH3/2AT+uzK77Mxt3WBeV7Ig== X-Received: by 10.46.0.201 with SMTP id e70mr969739lji.57.1471541861430; Thu, 18 Aug 2016 10:37:41 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.16.164 with HTTP; Thu, 18 Aug 2016 10:37:00 -0700 (PDT) In-Reply-To: References: <732DB60A-844D-4FA3-822B-08DB587DEB9B@apache.org> <6192A234-0F2E-4CB4-B3C7-142728E0800A@effectivemachines.com> <1471528833445.31857@hortonworks.com> From: Andrew Purtell Date: Thu, 18 Aug 2016 10:37:00 -0700 Message-ID: Subject: Re: [VOTE] Release Apache Hadoop 2.7.3 RC1 To: Chris Nauroth Cc: Junping Du , Vinod Kumar Vavilapalli , Allen Wittenauer , "common-dev@hadoop.apache.org" , "hdfs-dev@hadoop.apache.org" , "yarn-dev@hadoop.apache.org" , "mapreduce-dev@hadoop.apache.org" Content-Type: multipart/alternative; boundary=001a1142bfacec3af2053a5c0965 archived-at: Thu, 18 Aug 2016 17:37:51 -0000 --001a1142bfacec3af2053a5c0965 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable > What is a realistic strategy for us to evolve the HDFS audit log in a backward-compatible way? If the API is essentially any form of ad-hoc scripting, then for any proposed audit log format change, I can find a reason to veto it on grounds of backward incompatibility. Yeah when log scraping is the only way at information, then the API surface expands to cover all manner of ad-hoc scripting. Not sure moving away from emitting audit information in log lines would be operator friendly. That's a tough one. Just about everything in the ecosystem emits audit information as log lines. If Hadoop switches strategy to become a one-off doing something different this would be painful. Assuming log lines will be the way we continue to receive audit events from Hadoop/HDFS, please consider freezing any changes to audit logging today, develop a formal specification, add the specification to documentation, and then take care to not break the specification between releases. Because audit logging from the NN comes from low level places in FSNameSystem this is going to constrain maintenance and refactor of that and related code, so with my software maintainer hat on I feel your pain in advance. You'll want to hash out what level of compatibility you'd like to offer. I'd recommend only changing on major releases. On Thu, Aug 18, 2016 at 10:04 AM, Chris Nauroth wrote: > Andrew, thanks for adding your perspective on this. > > =E2=80=8B=E2=80=8B > What is a realistic strategy for us to evolve the HDFS audit log in a > backward-compatible way? If the API is essentially any form of ad-hoc > scripting, then for any proposed audit log format change, I can find a > reason to veto it on grounds of backward incompatibility. > > - I can=E2=80=99t add a new field on the end, because that would break an= awk > script that uses $NF expecting to find a specific field. > - I can=E2=80=99t prepend a new field, because that would break a "cut -f= 1" > expecting to find the timestamp. > - HDFS can=E2=80=99t add any new features, because someone might have wri= tten a > script that does "exit 1" if it finds an unexpected RPC in the "cmd=3D" f= ield. > - Hadoop is not allowed to add full IPv6 support, because someone might > have written a script that looks at the "ip=3D" field and parses it by IP= v4 > syntax. > > On the CLI, a potential solution for evolving the output is to preserve > the old format by default and only enable the new format if the user > explicitly passes a new argument. What should we do for the audit log? > Configuration flags in hdfs-site.xml? (That of course adds its own brand > of complexity.) > > =E2=80=8B=E2=80=8B > I=E2=80=99m particularly interested to hear potential solutions from peop= le like > Andrew and Allen who have been most vocal about the need for a stable > format. Without a solution, this unfortunately devolves into the format > being frozen within a major release line. > > We could benefit from getting a patch on the compatibility doc that > addresses the HDFS audit log specifically. > > --Chris Nauroth > > On 8/18/16, 8:47 AM, "Andrew Purtell" wrote: > > An incompatible APIs change is developer unfriendly. An incompatible > behavioral change is operator unfriendly. Historically, one dimension of > incompatibility has had a lot more mindshare than the other. It's great > that this might be changing for the better. > > Where I work when we move from one Hadoop 2.x minor to another we > always spend time updating our deployment plans, alerting, log scraping, > and related things due to changes. Some are debatable as if qualifying fo= r > the 'incompatible' designation. I think the audit logging change that > triggered this discussion is a good example of one that does. If you want > to audit HDFS actions those log emissions are your API. (Inotify doesn't > offer access control events.) One has to code regular expressions for > parsing them and reverse engineer under what circumstances an audit line = is > emitted so you can make assumptions about what transpired. Change either > and you might break someone's automation for meeting industry or legal > compliance obligations. Not a trivial matter. If you don't operate Hadoop > in production you might not realize the implications of such a change. Gl= ad > to see Hadoop has community diversity to recognize it in some cases. > > > On Aug 18, 2016, at 6:57 AM, Junping Du wrote= : > > > > I think Allen's previous comments are very misleading. > > In my understanding, only incompatible API (RPC, CLIs, WebService, > etc.) shouldn't land on branch-2, but other incompatible behaviors (logs, > audit-log, daemon's restart, etc.) should get flexible for landing. > Otherwise, how could 52 issues ( https://s.apache.org/xJk5) marked with > incompatible-changes could get landed on branch-2 after 2.2.0 release? Mo= st > of them are already released. > > > > Thanks, > > > > Junping > > ________________________________________ > > From: Vinod Kumar Vavilapalli > > Sent: Wednesday, August 17, 2016 9:29 PM > > To: Allen Wittenauer > > Cc: common-dev@hadoop.apache.org; hdfs-dev@hadoop.apache.org; > yarn-dev@hadoop.apache.org; mapreduce-dev@hadoop.apache.org > > Subject: Re: [VOTE] Release Apache Hadoop 2.7.3 RC1 > > > > I always look at CHANGES.txt entries for incompatible-changes and > this JIRA obviously wasn=E2=80=99t there. > > > > Anyways, this shouldn=E2=80=99t be in any of branch-2.* as committe= rs there > clearly mentioned that this is an incompatible change. > > > > I am reverting the patch from branch-2* . > > > > Thanks > > +Vinod > > > >> On Aug 16, 2016, at 9:29 PM, Allen Wittenauer < > aw@effectivemachines.com> wrote: > >> > >> > >> > >> -1 > >> > >> HDFS-9395 is an incompatible change: > >> > >> a) Why is not marked as such in the changes file? > >> b) Why is an incompatible change in a micro release, much less a > minor? > >> c) Where is the release note for this change? > >> > >> > >>> On Aug 12, 2016, at 9:45 AM, Vinod Kumar Vavilapalli < > vinodkv@apache.org> wrote: > >>> > >>> Hi all, > >>> > >>> I've created a release candidate RC1 for Apache Hadoop 2.7.3. > >>> > >>> As discussed before, this is the next maintenance release to > follow up 2.7.2. > >>> > >>> The RC is available for validation at: http://home.apache.org/~ > vinodkv/hadoop-2.7.3-RC1/ vinodkv/hadoop-2.7.3-RC0/> > >>> > >>> The RC tag in git is: release-2.7.3-RC1 > >>> > >>> The maven artifacts are available via repository.apache.org < > http://repository.apache.org/> at https://repository.apache.org/ > content/repositories/orgapachehadoop-1045/ org/content/repositories/orgapachehadoop-1045/> > >>> > >>> The release-notes are inside the tar-balls at location > hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html. I > hosted this at home.apache.org/~vinodkv/hadoop-2.7.3-RC1/releasenotes.htm= l > > for your quick perusal. > >>> > >>> As you may have noted, > >>> - few issues with RC0 forced a RC1 [1] > >>> - a very long fix-cycle for the License & Notice issues > (HADOOP-12893) caused 2.7.3 (along with every other Hadoop release) to sl= ip > by quite a bit. This release's related discussion thread is linked below: > [2]. > >>> > >>> Please try the release and vote; the vote will run for the usual = 5 > days. > >>> > >>> Thanks, > >>> Vinod > >>> > >>> [1] [VOTE] Release Apache Hadoop 2.7.3 RC0: > https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/index.html#2610= 6 > index.html#26106> > >>> [2]: 2.7.3 release plan: https://www.mail-archive.com/ > hdfs-dev%40hadoop.apache.org/msg24439.html 6yv2fyrs4jlepmmr> > >> > >> > >> ------------------------------------------------------------ > --------- > >> To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org > >> For additional commands, e-mail: yarn-dev-help@hadoop.apache.org > > > > > > ------------------------------------------------------------ > --------- > > To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org > > For additional commands, e-mail: yarn-dev-help@hadoop.apache.org > > > > > > ------------------------------------------------------------ > --------- > > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org > > For additional commands, e-mail: common-dev-help@hadoop.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org > For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org > > > > > --001a1142bfacec3af2053a5c0965--