Return-Path: X-Original-To: apmail-hadoop-general-archive@minotaur.apache.org Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1A7FA64D1 for ; Sat, 18 Jun 2011 06:51:22 +0000 (UTC) Received: (qmail 81557 invoked by uid 500); 18 Jun 2011 06:51:15 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 81329 invoked by uid 500); 18 Jun 2011 06:51:14 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 81321 invoked by uid 99); 18 Jun 2011 06:51:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Jun 2011 06:51:13 +0000 X-ASF-Spam-Status: No, hits=4.2 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL,URIBL_BLACK X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: 216.145.54.171 is neither permitted nor denied by domain of eric14@yahoo-inc.com) Received: from [216.145.54.171] (HELO mrout1.yahoo.com) (216.145.54.171) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Jun 2011 06:51:09 +0000 Received: from [10.0.1.3] (snvvpn4-10-72-168-c87.hq.corp.yahoo.com [10.72.168.87]) by mrout1.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id p5I6ofbv057275 for ; Fri, 17 Jun 2011 23:50:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=yahoo-inc.com; s=cobra; t=1308379842; bh=5e3fxNbGO695IKal4JVtPwM6ONxxEf128L0JgfJpNVY=; h=From:Content-Type:Content-Transfer-Encoding:Subject:Date: Message-Id:To:Mime-Version; b=AxcbalspvFtLntvEV8h/IhAOjSOA5crzJwLiT3Vs3PKV/2jn74lMiyQqJH+WQJIqS Ge5hkMwFMGNSCIpatwk0HCPLgvVUgqLL7tJZu8FpM+woFQLy3mB3o1f3boYHWGz4c4 Zg9S27RL919VUu3hgKJkluLLpz28BU7dMcpHV1os= From: Eric Baldeschwieler Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: [DISCUSSION] Thinking about 20.204 and beyond Date: Fri, 17 Jun 2011 23:50:40 -0700 Message-Id: <4E18CE25-6863-4318-8A95-17896B0B87D1@yahoo-inc.com> To: general@hadoop.apache.org Mime-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) Hi Folks, Along with starting a new release off the mainline (see previous mail), = the Yahoo! team plans to continue producing sustaining releases off the = Hadoop with security branch, such as 0.20.203 . I'm writing this email = to outline our plans, explain Yahoo's motivation for supporting this = work and request feedback and hopefully your endorsement. This = initiative stems from Yahoo's commitment to do its hadoop work in Apache = and discontinue the Yahoo Distribution of Hadoop = (http://yhoo.it/i9Ww8W). We hope to produce a new 0.20.204 release in Apache in the next few = weeks. Owen O'Malley is planning to act as release master for this = release. This will be based on work in the hadoop-with-security branch, = just as 0.20.203 but will include bugfixes and enhancements beyond those = in 0.20.203. This is one in a series of releases we hope to do in the = next 6-9 months as hadoop 0.23 (or whatever the community chooses to = call it) goes through the various stages of stability testing and = burn-in. CONTENTS OF THE RELEASE: Some highlights: - RPM & .deb packaging to ease deployment (back ported from trunk) - I am excited to see hadoop released with .deb & RPM packaging from = Apache for the first time. - This will greatly ease deployment - Disk fail in place (merged with trunk, except for some MR changes = conflict with MR-279, these will be reimplemented in MR-279) - This change has been motivated by operational problems we observed = with our new 12 disk machines. - This work should greatly improve Hadoop availability by keeping nodes = working when one of their disks fails - Lots of of additional fixes (I've included the change log below) WHY THIS PROCESS: Producing a stable release of Hadoop is a long, hard and expensive = process. Historically Y! has produced all such releases. Other = releases of Hadoop have either not been stable (Hadoop 0.19 and Hadoop = 0.21) or have been based on a stable Apache release driven by Yahoo (CDH = and Facebook). Once we've paid the price of making a stable release, it = makes a lot of sense to accept safe improvements as well as bug fixes. = Doing so allows one to get customer impacting improvements into = production in days, rather than years, which is what would happen if one = waited for changes to come in the next stable release off the Hadoop = mainline. Given that it takes many months to stabilize trunk, there is = no way to get new easy fixes into users hands quickly via a new mainline = release. For the last few years Yahoo has done sustaining engineering in open = source via Github. These patches have been contributed to Apache Hadoop = mainline and backported to the sustaining branch on github (for yahoo = 0.20 for example). We've then cut Yahoo releases from Github. Cloudera = and Facebook have also taken these patches from Github and incorporated = these improvements into their releases, so the community has benefitted = from this process for years. What we are planning to do now is simply = move this process into Apache, so that Apache releases themselves are = timely and relevant, not always a year or two behind what users need. How do I propose making these decisions? Deciding what is a safe patch = is a judgement call. Apache process suggests that the release manager = makes these calls (http://bit.ly/mJcBjc). For releases Y! champions, = such as 0.20 (arun & owen), we are ready to do the sustaining = engineering, make these calls and stand our reputation behind the = quality of the result. Other release masters are championing other = Apache Hadoop releases currently (Nigel and Tom) and I think they should = be free to do the same. For hadoop 20.204, I propose pushing what is = currently in the hadoop-with-security branch. Part of the reason for = this thread is to socialize this process, so that community members can = champion stable patches for inclusion in 20.205. In the future I = propose that a branch's release master request suggestions for future = releases on this list, but is free to use their judgement on what is = accepted (pretty much what nigel is doing on 0.22 today). ---- Conclusion: The vote on 0.20.203 was acrimonious, but I believe that 0.20.203 was a = useful step forward for Apache Hadoop. 0.20.204 will again be the best = stable release of Apache Hadoop ever. I hope folks can support the = effort. With your contribution 0.20.205 can be even better, fixing = issues that plague your Hadoop clusters. This email is part of a wider = effort from the Yahoo team to co-plan our work with the community. Thanks, Eric14 --- eric14 a.k.a. Eric Baldeschwieler VP Hadoop Software Development @Yahoo! =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =46rom CHANGES.txt: Release 0.20.204.0 - unreleased NEW FEATURES HADOOP-6255. Create RPM and Debian packages for common. Changes = deployment layout to be consistent across the binary tgz, rpm, and deb. Adds = setup scripts for easy one node cluster configuration and user creation. (Eric Yang via omalley) BUG FIXES MAPREDUCE-2495. exit() the TaskTracker when the distributed cache = cleanup thread dies. (Robert Joseph Evans via cdouglas) HDFS-1878. TestHDFSServerPorts unit test failure - race condition=20 in FSNamesystem.close() causes NullPointerException without serious consequence. (mattf) MAPREDUCE-2452. Moves the cancellation of delegation tokens to a = separate thread. (ddas) MAPREDUCE-2555. Avoid sprious logging from completedtasks. (Thomas = Graves via cdouglas) MAPREDUCE-2451. Log the details from health check script at the JobTracker. (Thomas Graves via cdouglas) MAPREDUCE-2535. Fix NPE in JobClient caused by retirement. (Robert = Joseph Evans via cdouglas) MAPREDUCE-2456. Log the reduce taskID and associated TaskTrackers = with failed fetch notifications in the JobTracker log. (Jeffrey Naisbitt via cdouglas) HDFS-2044. TestQueueProcessingStatistics failing automatic test due = to=20 timing issues. (mattf) HADOOP-7248. Update eclipse target to generate .classpath from ivy = config. (Thomas Graves and Tom White via cdouglas) MAPREDUCE-2558. Add queue-level metrics 0.20-security branch - test = fix (jeffrey nasbit via mahadev) HADOOP-7364. TestMiniMRDFSCaching fails if test.build.dir is set to=20= something other than build/test. (Thomas Graves via mahadev) HADOOP-7277. Add generation of run configurations to eclipse target. (Jeffrey Naisbitt and Philip Zeyliger via cdouglas) HADOOP-7373. Fix {start,stop}-{dfs,mapred} and hadoop-daemons.sh from trying to use the wrong bin directory. (omalley) HADOOP-7274. Fix typos in IOUtils. (Jonathan Eagles via cdouglas) HADOOP-7369. Fix permissions in tarball for sbin/* and libexec/* = (omalley) MAPREDUCE-2479. Move distributed cache cleanup to a background task, backporting MAPREDUCE-1568. (Robert Joseph Evans via cdouglas) HADOOP-7356. Fix bin/hadoop scripts (eyang via omalley) HADOOP-7272. Remove unnecessary security related info logs. (suresh) MAPREDUCE-2514. Fix typo in TaskTracker ReinitTrackerAction log = message. (Jonathan Eagles via cdouglas) HDFS-1906. Remove logging exception stack trace in client logs when = one of the datanode targets to read from is not reachable. (suresh) MAPREDUCE-2490. Add logging to graylist and blacklist activity to aid diagnosis of related issues. (Jonathan Eagles via cdouglas) MAPREDUCE-2447. Fix Child.java to set Task.jvmContext sooner to avoid corner cases in error handling. (Siddharth Seth via acmurthy)=20 MAPREDUCE-2429. Validate JVM in TaskUmbilicalProtocol. (Siddharth = Seth via acmurthy)=20 MAPREDUCE-2418. Show job errors in JobHistory page. (Siddharth Seth = via acmurthy)=20 HDFS-1592. At Startup, Valid volumes required in FSDataset doesn't handle consistently with volumes tolerated. (Bharath Mundlapudi) HDFS-1598. Directory listing on hftp:// does not show .*.crc files. (szetszwo) HDFS-1750. ListPathsServlet should not use = HdfsFileStatus.getLocalName() to get file name since it may return an empty string. (szetszwo) HDFS-1758. Make Web UI JSP pages thread safe. (Tanping via suresh) HDFS-1773. Do not show decommissioned datanodes, which are not in = both include and exclude lists, on web and JMX interfaces. (Tanping Wang via szetszwo) MAPREDUCE-2409. Distinguish distributed cache artifacts localized as files, archives. (Siddharth Seth via cdouglas) MAPREDUCE-118. Fix Job.getJobID() to get the new ID as soon as it's=20= assigned. (Amareshwari Sriramadasu and Dick King via cdouglas) MAPREDUCE-2411. Force an exception when the queue has an invalid name = or its ACLs are misconfigured. (Dick King via cdouglas) HDFS-1258. Clearing namespace quota on "/" corrupts fs image. =20 (Aaron T. Myers via szetszwo) HDFS-1189. Quota counts missed between clear quota and set quota. (John George via szetszwo) HDFS-1692. In secure mode, Datanode process doesn't exit when disks=20= fail. (bharathm via boryas) MAPREDUCE-2420. JobTracker should be able to renew delegation token=20= over HTTP (boryas) MAPREDUCE-2443. Fix TaskAspect for TaskUmbilicalProtocol.ping(..). (Siddharth Seth via szetszwo) HDFS-1842. Handle editlog opcode conflict with 0.20.203 during = upgrade, by throwing an error to indicate the editlog needs to be empty. (suresh) HDFS-1377. Quota bug for partial blocks allows quotas to be violated. = (eli) HDFS-2057. Wait time to terminate the threads causes unit tests to take longer time. (Bharath Mundlapudi via suresh) IMPROVEMENTS HADOOP-7144. Expose JMX metrics via JSON servlet. (Robert Joseph = Evans via cdouglas) MAPREDUCE-2524. Port reduce failure reporting semantics from trunk, = to fail faulty maps more aggressively. (Thomas Graves via cdouglas) MAPREDUCE-2529. Add support for regex-based shuffle metric counting exceptions. (Thomas Graves via cdouglas) HADOOP-7398. Suppress warnings about use of HADOOP_HOME. (omalley) MAPREDUCE-2415. Distribute the user task logs on to multiple disks. (Bharath Mundlapudi via omalley) MAPREDUCE-2413. TaskTracker should handle disk failures by = reinitializing=20 itself. (Ravi Gummadi and Jagane Sundar via omalley) HDFS-1541. Not marking datanodes dead when namenode in safemode. (hairong) HDFS-1767. Namenode ignores non-initial block report from datanodes when in safemode during startup. (Matt Foley via suresh) MAPREDUCE-1251. c++ utils doesn't compile. (Eli Collins via shv) HADOOP-7330. Fix MetricsSourceAdapter to use the value instead of the=20= object. (Luke Lu via omalley)