hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Foley <mfo...@hortonworks.com>
Subject 0.20.205 Sustaining Release branch plan and content plan
Date Wed, 07 Sep 2011 10:23:42 GMT
Hi all,
Over the past week a number of people have provided input for patches they
would like to see in 205, with reasons and risk evaluations; please
see the threads
"Content request for 0.20.205 Sustaining Release" and
"Add Append-HBase support in upcoming 20.205".
Thanks to all who took the effort to share this information with the list.

The various patches are grouped below, in numeric order for ease of review.
My proposed plan for branching the branch-0.20.205 is at the end of
this message.

Comparing the requests with the patches currently in
branch-0.20-security, we have the following:

1. THESE PATCHES ARE ALREADY IN 20-security AND ARE REQUESTED FOR
INCLUSION IN 205:

    HADOOP-6833. IPC leaks call parameters when exceptions thrown.
(Todd Lipcon via eli)
    HADOOP-6889. Make RPC to have an option to timeout - backport to
  0.20-security. Unit tests updated to 17/Aug/2011 version.    (John
George and Ravi Prakash via mattf)
    HADOOP-7314. Add support for throwing UnknownHostException when a
host    doesn't resolve. Needed for MAPREDUCE-2489. (Jeffrey Naisbitt
via mattf)
    HADOOP-7432. Back-port HADOOP-7110 to 0.20-security: Implement
chmod    in NativeIO library. (Sherry Chen via mattf)
    HADOOP-7472. RPC client should deal with IP address change.
(Kihwal Lee via suresh)
    HADOOP-7539. merge hadoop archive goodness from trunk to .20 (John
George     via mahadev)
    HDFS-0142. Blocks that are being written by a client are stored in
the    blocksBeingWritten directory.     (Dhruba Borthakur, Nicolas
Spiegelberg, Todd Lipcon via dhruba)
    HDFS-0200. Support append and sync for hadoop 0.20 branch. (dhruba)
    HDFS-0561. Fix write pipeline READ_TIMEOUT.    (Todd Lipcon via dhruba)
    HDFS-0606. Fix ConcurrentModificationException in
invalidateCorruptReplicas.    (Todd Lipcon via dhruba)
    HDFS-0630. Client can exclude specific nodes in the write
pipeline.    (Nicolas Spiegelberg via dhruba)
    HDFS-0724.  Use a bidirectional heartbeat to detect stuck
pipeline. (hairong)
    HDFS-0826. Allow a mechanism for an application to detect that
datanode(s) have died in the write pipeline. (dhruba)
    HDFS-0895. Allow hflush/sync to occur in parallel with new writes
to    the file. (Todd Lipcon via hairong)
    HDFS-0988. Fix bug where savenameSpace can corrupt edits log.
(Nicolas Spiegelberg via dhruba)
    HDFS-1054. remove sleep before retry for allocating a block.
(Todd Lipcon via dhruba)
    HDFS-1057.  Concurrent readers hit ChecksumExceptions if following
    a writer to very end of file (Sam Rash via dhruba)
    HDFS-1118. Fix socketleak on DFSClient.     (Zheng Shao via dhruba)
    HDFS-1141. completeFile does not check lease ownership.    (Todd
Lipcon via dhruba)
    HDFS-1164. TestHdfsProxy is failing. (Todd Lipcon)
    HDFS-1202. DataBlockScanner throws NPE when updated before
initialized.     (Todd Lipcon)
    HDFS-1204. Lease expiration should recover single files,     not
entire lease holder (Sam Rash via dhruba)
    HDFS-1210. DFSClient should log exception when block recovery
fails.    (Todd Lipcon via dhruba)
    HDFS-1211. Block receiver should not log "rewind" packets at INFO
level.    (Todd Lipcon)
    HDFS-1346. DFSClient receives out of order packet ack. (hairong)
    HDFS-1520. Lightweight NameNode operation recoverLease to trigger
   lease recovery. (Hairong Kuang via dhruba)
    HDFS-1554. New semantics for recoverLease. (hairong)
    HDFS-1555. Disallow pipelien recovery if a file is already being
 lease recovered. (hairong)
    HDFS-1836. Thousand of CLOSE_WAIT socket. Contributed by Todd
Lipcon,    ported to security branch by Bharath Mundlapudi. (via
mattf)
    HDFS-2053. Bug in INodeDirectory#computeContentSummary warning
(Michael Noll via eli)
    HDFS-2117. DiskChecker#mkdirsWithExistsAndPermissionCheck may
return true even when the dir is not created. (eli)
    HDFS-2190. NN fails to start if it encounters an empty or
malformed fstime    file. (atm)
    HDFS-2202. Add a new DFSAdmin command to set balancer bandwidth of
   datanodes without restarting.  (Eric Payne via szetszwo)
    MAPREDUCE-2187. Reporter sends progress during sort/merge. (Anupam
Seth via    acmurthy)
    MAPREDUCE-2324. Removed usage of broken
ResourceEstimator.getEstimatedReduceInputSize to check against usable
  disk-space on TaskTracker. (Robert Evans via acmurthy)
    MAPREDUCE-2489. Jobsplits with random hostnames can make the
queue unusable. (Jeffrey Naisbitt via mahadev)
    MAPREDUCE-2494. Make the distributed cache delete entires using
LRU     priority (Robert Joseph Evans via mahadev)
    MAPREDUCE-2650. back-port MAPREDUCE-2238 to 0.20-security.
(Sherry Chen via mahadev)
    MAPREDUCE-2705. Implements launch of multiple tasks concurrently.
  (Thomas Graves via ddas)
    MAPREDUCE-2729. Ensure jobs with reduces which can't be launched
due to    slow-start do not count for user-limits. (Sherry Chen via
acmurthy)
    MAPREDUCE-2780. Use a utility method to set service in token.
(Daryn Sharp via jitendra)
    MAPREDUCE-2852. Jira for YDH bug 2854624. (Kihwal Lee via eli)

2. THESE PATCHES ARE ALREADY IN 20-security BUT NO ONE HAS YET SPOKEN FOR
INCLUDING THEM IN 205:

    HADOOP-7400. Fix HdfsProxyTests fails when the -Dtest.build.dir
 and -Dbuild.test is set a dir other than build dir (gkesavan).
    HADOOP-7594. Support HTTP REST in HttpServer.  (szetszwo)
    HADOOP-7596. Makes packaging of 64-bit jsvc possible. Has other
bug fixes to do with packaging. (Eric Yang via ddas)
    HDFS-1207. FSNamesystem.stallReplicationWork should be volatile.
 (Todd Lipcon via dhruba)
    HDFS-2259. DN web-UI doesn't work with paths that contain html. (eli)
    HDFS-2309. TestRenameWhileOpen fails. (jitendra)
    MAPREDUCE-7343. Make the number of warnings accepted by test-patch
   configurable to limit false positives. (Thomas Graves via cdouglas)

3. THESE PATCHES ARE REQUESTED FOR INCLUSION IN 205, BUT ARE NOT YET
IN 20-security:

Additional append issues (proponents Todd and Suresh):
HADOOP-6722   Workaround a TCP spec quirk by not allowing
NetUtils.connect to connect to itself
HDFS-0611     Heartbeats times from Datanodes increase when there are
plenty of blocks to delete
HDFS-0915     Write pipeline hangs for too long when ResponseProcessor
hits timeout
HDFS-1056     Multi-node RPC deadlocks during block recovery
HDFS-1122     Don't allow client verification to prematurely add
inprogress blocks to DataBlockScanner
HDFS-1186     0.20: DNs should interrupt writers at start of recovery
HDFS-1197     Blocks are considered "complete" prematurely after
commitBlockSynchronization or DN restart
HDFS-1218     20 append: Blocks recovered on startup should be treated
with lower priority during block synchronization
HDFS-1242     0.20 append: Add test for appendFile() race solved in HDFS-142
HDFS-1247     Improvements to HDFS-1204 test
HDFS-1248     Misc cleanup/logging improvements for branch-20-append
HDFS-1252     TestDFSConcurrentFileOperations broken in 0.20-append
HDFS-1254     Support append/sync via the default configuration.
HDFS-1260     0.20: Block lost when multiple DNs trying to recover it
to different genstamps
HDFS-1262     Failed pipeline creation during append leaves lease hanging on NN
HDFS-1264     0.20: OOME in HDFS client made an unrecoverable HDFS block
HDFS-1266     Missing license headers in branch-20-append
HDFS-1779     After NameNode restart , Clients can not read partial
files even after client invokes Sync.
HDFS-2300     TestFileAppend4 and TestMultiThreadedSync fail on 20.append

other issues (with proponents' names):
Suresh	HADOOP-7119    add Kerberos HTTP SPNEGO authentication support
to Hadoop JT/NN/DN/TT web-consoles
Nathan	HADOOP-7510 - Tokens should use original hostname provided instead of ip
John George	HADOOP-7602    wordcount, sort etc on har files fails with NPE
Nathan	HDFS-2257 - HftpFilesysystem should implement GetDelegationTokens
Arun/Matei	MAPREDUCE-0551    Add preemption to the fair scheduler
Arun/Matei	MAPREDUCE-0706    Support for FIFO pools in the fair scheduler
Arun/Matei	and other FairScheduler-related items
Venu	MAPREDUCE-2237    Lost heartbeat response containing MapTask
throws NPE when it is resent
Venu	MAPREDUCE-2264    Job status exceeds 100% in some cases
Bharath	MAPREDUCE-2413    TaskTracker should handle disk failures at
both startup and runtime
Bharath	MAPREDUCE-2415    Distribute TaskTracker userlogs onto multiple disks
Venu	MAPREDUCE-2549    Potential resource leaks in HadoopServer.java,
RunOnHadoopWizard.java and Environment.java
Joep	MAPREDUCE-2610    Inconsistent API JobClient.getQueueAclsForCurrentUser
Nathan	MAPREDUCE-2621 - TestCapacityScheduler fails with Queue q1 does not exist
Nathan	MAPREDUCE-2651 - Race condition in Linux Task Controller for
job log directory creation
Nathan	MAPREDUCE-2764 - Fix renewal of dfs delegation tokens
Joep	MAPREDUCE-2779    JobSplitWriter.java can't handle large job.split file
Nathan	MAPREDUCE-2915 - LinuxTaskController does not work when
JniBasedUnixGroupsNetgroupMapping or JniBasedUnixGroupsMapping is
enabled

Obviously plenty of material has accumulated while 204 was being
stabilized.  I would like to
proceed with 205 relatively quickly.  I plan to create the release
branch (branch-0.20.205)
this weekend, 10 September.

The items in group 1 are acceptable to me and will be included.

The items in group 2 are acceptable for inclusion, but someone needs
to speak up for them
in the next three days.  I'm not going to include anything that nobody wants!

The items in group 3  will be the subject of further discussion
between their proponents
and myself.  If they can be committed WITH UNIT TESTS AND APPROPRIATE LEVELS
OF TESTING, I'm still willing to see them in 205.  Most of the items
in the first sub-group
(additional Append issues) have in fact been well-tested in CDH
releases, and would be
valuable to include in 205; it's just a matter of getting the existing
patches committed.
About half of them seem to need no changes, the other half may need to
be re-based.

Again, I plan to create the release branch this weekend.  Thanks for everyone's
contributions.

--Matt

Mime
View raw message