hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Baldeschwieler <eri...@yahoo-inc.com>
Subject [DISCUSSION] Thinking about 20.204 and beyond
Date Sat, 18 Jun 2011 06:50:40 GMT
Hi Folks,

Along with starting a new release off the mainline (see previous mail), the Yahoo! team plans
to continue producing sustaining releases off the Hadoop with security branch, such as 0.20.203
.  I'm writing this email to outline our plans, explain Yahoo's motivation for supporting
this work and request feedback and hopefully your endorsement.  This initiative stems from
Yahoo's commitment to do its hadoop work in Apache and discontinue the Yahoo Distribution
of Hadoop (http://yhoo.it/i9Ww8W).

We hope to produce a new 0.20.204 release in Apache in the next few weeks.  Owen O'Malley
is planning to act as release master for this release.  This will be based on work in the
hadoop-with-security branch, just as 0.20.203 but will include bugfixes and enhancements beyond
those in 0.20.203.  This is one in a series of releases we hope to do in the next 6-9 months
as hadoop 0.23 (or whatever the community chooses to call it) goes through the various stages
of stability testing and burn-in.

CONTENTS OF THE RELEASE:

Some highlights:

- RPM & .deb packaging to ease deployment (back ported from trunk)
 - I am excited to see hadoop released with .deb & RPM packaging from Apache for the first
time.
 - This will greatly ease deployment
- Disk fail in place (merged with trunk, except for some MR changes conflict with MR-279,
these will be reimplemented in MR-279)
 - This change has been motivated by operational problems we observed with our new 12 disk
machines.
 - This work should greatly improve Hadoop availability by keeping nodes working when one
of their disks fails

- Lots of of additional fixes (I've included the change log below)

WHY THIS PROCESS:

Producing a stable release of Hadoop is a long, hard and expensive process.  Historically
Y! has produced all such releases.  Other releases of Hadoop have either not been stable (Hadoop
0.19 and Hadoop 0.21) or have been based on a stable Apache release driven by Yahoo (CDH and
Facebook).  Once we've paid the price of making a stable release, it makes a lot of sense
to accept safe improvements as well as bug fixes.  Doing so allows one to get customer impacting
improvements into production in days, rather than years, which is what would happen if one
waited for changes to come in the next stable release off the Hadoop mainline.  Given that
it takes many months to stabilize trunk, there is no way to get new easy fixes into users
hands quickly via a new mainline release.

For the last few years Yahoo has done sustaining engineering in open source via Github.  These
patches have been contributed to Apache Hadoop mainline and backported to the sustaining branch
on github (for yahoo 0.20 for example).  We've then cut Yahoo releases from Github.  Cloudera
and Facebook have also taken these patches from Github and incorporated these improvements
into their releases, so the community has benefitted from this process for years.  What we
are planning to do now is simply move this process into Apache, so that Apache releases themselves
are timely and relevant, not always a year or two behind what users need.

How do I propose making these decisions?  Deciding what is a safe patch is a judgement call.
 Apache process suggests that the release manager makes these calls (http://bit.ly/mJcBjc).
 For releases Y! champions, such as 0.20 (arun & owen), we are ready to do the sustaining
engineering, make these calls and stand our reputation behind the quality of the result. 
Other release masters are championing other Apache Hadoop releases currently (Nigel and Tom)
and I think they should be free to do the same.  For hadoop 20.204, I propose pushing what
is currently in the hadoop-with-security branch.  Part of the reason for this thread is to
socialize this process, so that community members can champion stable patches for inclusion
in 20.205.  In the future I propose that a branch's release master request suggestions for
future releases on this list, but is free to use their judgement on what is accepted (pretty
much what nigel is doing on 0.22 today).

----

Conclusion:

The vote on 0.20.203 was acrimonious, but I believe that 0.20.203 was a useful step forward
for Apache Hadoop.  0.20.204 will again be the best stable release of Apache Hadoop ever.
 I hope folks can support the effort.  With your contribution 0.20.205 can be even better,
fixing issues that plague your Hadoop clusters.  This email is part of a wider effort from
the Yahoo team to co-plan our work with the community.

Thanks,

Eric14
---
eric14 a.k.a. Eric Baldeschwieler
VP Hadoop Software Development @Yahoo!


===============
===============

From CHANGES.txt:

Release 0.20.204.0 - unreleased

 NEW FEATURES

   HADOOP-6255. Create RPM and Debian packages for common. Changes deployment
   layout to be consistent across the binary tgz, rpm, and deb. Adds setup
   scripts for easy one node cluster configuration and user creation.
   (Eric Yang via omalley)

 BUG FIXES

   MAPREDUCE-2495. exit() the TaskTracker when the distributed cache cleanup
   thread dies. (Robert Joseph Evans via cdouglas)

   HDFS-1878. TestHDFSServerPorts unit test failure - race condition 
   in FSNamesystem.close() causes NullPointerException without serious
   consequence. (mattf)

   MAPREDUCE-2452. Moves the cancellation of delegation tokens to a separate
   thread. (ddas)

   MAPREDUCE-2555. Avoid sprious logging from completedtasks. (Thomas Graves
   via cdouglas)

   MAPREDUCE-2451. Log the details from health check script at the
   JobTracker. (Thomas Graves via cdouglas)

   MAPREDUCE-2535. Fix NPE in JobClient caused by retirement. (Robert Joseph
   Evans via cdouglas)

   MAPREDUCE-2456. Log the reduce taskID and associated TaskTrackers with
   failed fetch notifications in the JobTracker log.
   (Jeffrey Naisbitt via cdouglas)

   HDFS-2044. TestQueueProcessingStatistics failing automatic test due to 
   timing issues. (mattf)

   HADOOP-7248. Update eclipse target to generate .classpath from ivy config.
   (Thomas Graves and Tom White via cdouglas)

   MAPREDUCE-2558. Add queue-level metrics 0.20-security branch - test fix
   (jeffrey nasbit via mahadev)

   HADOOP-7364. TestMiniMRDFSCaching fails if test.build.dir is set to 
   something other than build/test. (Thomas Graves via mahadev)

   HADOOP-7277. Add generation of run configurations to eclipse target.
   (Jeffrey Naisbitt and Philip Zeyliger via cdouglas)

   HADOOP-7373. Fix {start,stop}-{dfs,mapred} and hadoop-daemons.sh from
   trying to use the wrong bin directory. (omalley)

   HADOOP-7274. Fix typos in IOUtils. (Jonathan Eagles via cdouglas)

   HADOOP-7369. Fix permissions in tarball for sbin/* and libexec/* (omalley)

   MAPREDUCE-2479. Move distributed cache cleanup to a background task,
   backporting MAPREDUCE-1568. (Robert Joseph Evans via cdouglas)

   HADOOP-7356. Fix bin/hadoop scripts (eyang via omalley)

   HADOOP-7272. Remove unnecessary security related info logs. (suresh)

   MAPREDUCE-2514. Fix typo in TaskTracker ReinitTrackerAction log message.
   (Jonathan Eagles via cdouglas)

   HDFS-1906. Remove logging exception stack trace in client logs when one of
   the datanode targets to read from is not reachable. (suresh)

   MAPREDUCE-2490. Add logging to graylist and blacklist activity to aid
   diagnosis of related issues. (Jonathan Eagles via cdouglas)

   MAPREDUCE-2447. Fix Child.java to set Task.jvmContext sooner to avoid
   corner cases in error handling. (Siddharth Seth via acmurthy) 

   MAPREDUCE-2429. Validate JVM in TaskUmbilicalProtocol. (Siddharth Seth via
   acmurthy) 

   MAPREDUCE-2418. Show job errors in JobHistory page. (Siddharth Seth via
   acmurthy) 

   HDFS-1592. At Startup, Valid volumes required in FSDataset doesn't
   handle consistently with volumes tolerated. (Bharath Mundlapudi)

   HDFS-1598. Directory listing on hftp:// does not show
   .*.crc files.  (szetszwo)

   HDFS-1750. ListPathsServlet should not use HdfsFileStatus.getLocalName()
   to get file name since it may return an empty string.  (szetszwo)

   HDFS-1758. Make Web UI JSP pages thread safe. (Tanping via suresh)

   HDFS-1773. Do not show decommissioned datanodes, which are not in both
   include and exclude lists, on web and JMX interfaces.
   (Tanping Wang via szetszwo)

   MAPREDUCE-2409. Distinguish distributed cache artifacts localized as
   files, archives. (Siddharth Seth via cdouglas)

   MAPREDUCE-118. Fix Job.getJobID() to get the new ID as soon as it's 
   assigned. (Amareshwari Sriramadasu and Dick King via cdouglas)

   MAPREDUCE-2411. Force an exception when the queue has an invalid name or
   its ACLs are misconfigured. (Dick King via cdouglas)

   HDFS-1258. Clearing namespace quota on "/" corrupts fs image.  
   (Aaron T. Myers via szetszwo)

   HDFS-1189. Quota counts missed between clear quota and set quota.
   (John George via szetszwo)

   HDFS-1692. In secure mode, Datanode process doesn't exit when disks 
   fail. (bharathm via boryas)

   MAPREDUCE-2420. JobTracker should be able to renew delegation token 
   over HTTP (boryas)

   MAPREDUCE-2443. Fix TaskAspect for TaskUmbilicalProtocol.ping(..).
   (Siddharth Seth via szetszwo)

   HDFS-1842. Handle editlog opcode conflict with 0.20.203 during upgrade,
   by throwing an error to indicate the editlog needs to be empty.
   (suresh)

   HDFS-1377. Quota bug for partial blocks allows quotas to be violated. (eli)

   HDFS-2057. Wait time to terminate the threads causes unit tests to
   take longer time. (Bharath Mundlapudi via suresh)

 IMPROVEMENTS

   HADOOP-7144. Expose JMX metrics via JSON servlet. (Robert Joseph Evans via
   cdouglas)

   MAPREDUCE-2524. Port reduce failure reporting semantics from trunk, to
   fail faulty maps more aggressively. (Thomas Graves via cdouglas)
   MAPREDUCE-2529. Add support for regex-based shuffle metric counting
   exceptions. (Thomas Graves via cdouglas)

   HADOOP-7398. Suppress warnings about use of HADOOP_HOME. (omalley)

   MAPREDUCE-2415. Distribute the user task logs on to multiple disks.
   (Bharath Mundlapudi via omalley)

   MAPREDUCE-2413. TaskTracker should handle disk failures by reinitializing 
   itself. (Ravi Gummadi and Jagane Sundar via omalley)

   HDFS-1541. Not marking datanodes dead when namenode in safemode.
   (hairong)

   HDFS-1767. Namenode ignores non-initial block report from datanodes
   when in safemode during startup. (Matt Foley via suresh)

   MAPREDUCE-1251. c++ utils doesn't compile. (Eli Collins via shv)

  HADOOP-7330. Fix MetricsSourceAdapter to use the value instead of the 
   object. (Luke Lu via omalley)


Mime
View raw message