hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Add Append-HBase support in upcoming 20.205
Date Fri, 02 Sep 2011 20:03:28 GMT
The following other JIRAs have been committed in CDH for 18 months or
so, for the purpose of HBase. You may want to consider backporting
them as well - many were never committed to 0.20-append due to lack of
reviews by HDFS committers at the time.

    HDFS-1056. Fix possible multinode deadlocks during block recovery
when using ephemeral dataxceiv

    Description: Fixes the logic by which datanodes identify local RPC targets
                 during block recovery for the case when the datanode
                 is configured with an ephemeral data transceiver port.
    Reason: Potential internode deadlock for clusters using ephemeral ports

    HADOOP-6722. Workaround a TCP spec quirk by not allowing
NetUtils.connect to connect to itself

    Description: TCP's ephemeral port assignment results in the possibility
                 that a client can connect back to its own outgoing socket,
                 resulting in failed RPCs or datanode transfers.
    Reason: Fixes intermittent errors in cluster testing with ephemeral
            IPC/transceiver ports on datanodes.

    HDFS-1122. Don't allow client verification to prematurely add
inprogress blocks to DataBlockScanner

    Description: When a client reads a block that is also open for writing,
                 it should not add it to the datanode block scanner.
                 If it does, the block scanner can incorrectly mark the
                 block as corrupt, causing data loss.
    Reason: Potential dataloss with concurrent writer-reader case.

    HDFS-1248. Miscellaneous cleanup and improvements on 0.20 append branch

    Description: Miscellaneous code cleanup and logging changes, including:
     - Slight cleanup to recoverFile() function in TestFileAppend4
     - Improve error messages on OP_READ_BLOCK
     - Some comment cleanup in FSNamesystem
     - Remove toInodeUnderConstruction (was not used)
     - Add some checks for null blocks in FSNamesystem to avoid a possible NPE
     - Only log "inconsistent size" warnings at WARN level for
non-under-construction blocks.
     - Redundant addStoredBlock calls are also not worthy of WARN level
     - Add some extra information to a warning in ReplicationTargetChooser
    Reason: Improves diagnosis of error cases and clarity of code

    HDFS-1242. Add unit test for the appendFile race condition /
synchronization bug fixed in HDFS-142

    Reason: Test coverage for previously applied patch.

    HDFS-1218. Replicas that are recovered during DN startup should
not be allowed to truncate better replicas.

    Description: If a datanode loses power and then recovers, its replicas
                 may be truncated due to the recovery of the local FS
                 journal. This patch ensures that a replica truncated by
                 a power loss does not truncate the block on HDFS.
    Reason: Potential dataloss bug uncovered by power failure simulation

    HDFS-915. Write pipeline hangs for too long when ResponseProcessor
hits timeout

    Description: Previously, the write pipeline would hang for the entire write
                 timeout when it encountered a read timeout (eg due to a
                 network connectivity issue). This patch interrupts the writing
                 thread when a read error occurs.
    Reason: Faster recovery from pipeline failure for HBase and other
            interactive applications.

    HDFS-1186. Writers should be interrupted when recovery is started,
not when it's completed.

    Description: When the write pipeline recovery process is initiated, this
                 interrupts any concurrent writers to the block under recovery.
                 This prevents a case where some edits may be lost if the
                 writer has lost its lease but continues to write (eg due to
                 a garbage collection pause)
    Reason: Fixes a potential dataloss bug

commit a960eea40dbd6a4e87072bdf73ac3b62e772f70a
Author: Todd Lipcon <todd@lipcon.org>
Date:   Sun Jun 13 23:02:38 2010 -0700

    HDFS-1197. Received blocks should not be added to block map
prematurely for under construction files

    Description: Fixes a possible dataloss scenario when using append() on
                 real-life clusters. Also augments unit tests to uncover
                 similar bugs in the future by simulating latency when
                 reporting blocks received by datanodes.
    Reason: Append support dataloss bug
    Author: Todd Lipcon

    HDFS-1260. tryUpdateBlock should do validation before renaming meta file

    Description: Solves bug where block became inaccessible in certain failure
                 conditions (particularly network partitions). Observed under
                 HBase workload at user site.
    Reason: Potential loss of syunced data when write pipeline fails

On Fri, Sep 2, 2011 at 11:20 AM, Suresh Srinivas <suresh@hortonworks.com> wrote:
> I also propose following jiras, which are non append related bug fixes from
> 0.20-append branch:
>   - HDFS-1164. TestHdfsProxy is failing.
>   - HDFS-1211. Block receiver should not log "rewind" packets at INFO
>   level.
>   - HDFS-1118. Fix socketleak on DFSClient.
>   - HDFS-1210. DFSClient should log exception when block recovery fails.
>   - HDFS-606. Fix ConcurrentModificationException in
>   invalidateCorruptReplicas.
>   - HDFS-561. Fix write pipeline READ_TIMEOUT.
>   - HDFS-1202.  DataBlockScanner throws NPE when updated before
>   initialized.
> Risk Level:
> These are useful bugfixes from append branch and are not big changes to the
> code base.
> These jiras have already been merged into 0.20-security branch.

Todd Lipcon
Software Engineer, Cloudera

View raw message