hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Foley <mfo...@hortonworks.com>
Subject Re: Add Append-HBase support in upcoming 20.205
Date Fri, 02 Sep 2011 21:34:08 GMT
Hi Todd,
Thank you, this is tremendously valuable input!  I'll have to look in detail
at each of these ten jiras,
and will get back to the list with more info shortly.
--Matt

On Fri, Sep 2, 2011 at 1:03 PM, Todd Lipcon <todd@cloudera.com> wrote:

> The following other JIRAs have been committed in CDH for 18 months or
> so, for the purpose of HBase. You may want to consider backporting
> them as well - many were never committed to 0.20-append due to lack of
> reviews by HDFS committers at the time.
>
>    HDFS-1056. Fix possible multinode deadlocks during block recovery
> when using ephemeral dataxceiv
>
>    Description: Fixes the logic by which datanodes identify local RPC
> targets
>                 during block recovery for the case when the datanode
>                 is configured with an ephemeral data transceiver port.
>    Reason: Potential internode deadlock for clusters using ephemeral ports
>
>
>    HADOOP-6722. Workaround a TCP spec quirk by not allowing
> NetUtils.connect to connect to itself
>
>    Description: TCP's ephemeral port assignment results in the possibility
>                 that a client can connect back to its own outgoing socket,
>                 resulting in failed RPCs or datanode transfers.
>    Reason: Fixes intermittent errors in cluster testing with ephemeral
>            IPC/transceiver ports on datanodes.
>
>    HDFS-1122. Don't allow client verification to prematurely add
> inprogress blocks to DataBlockScanner
>
>    Description: When a client reads a block that is also open for writing,
>                 it should not add it to the datanode block scanner.
>                 If it does, the block scanner can incorrectly mark the
>                 block as corrupt, causing data loss.
>    Reason: Potential dataloss with concurrent writer-reader case.
>
>    HDFS-1248. Miscellaneous cleanup and improvements on 0.20 append branch
>
>    Description: Miscellaneous code cleanup and logging changes, including:
>     - Slight cleanup to recoverFile() function in TestFileAppend4
>     - Improve error messages on OP_READ_BLOCK
>     - Some comment cleanup in FSNamesystem
>     - Remove toInodeUnderConstruction (was not used)
>     - Add some checks for null blocks in FSNamesystem to avoid a possible
> NPE
>     - Only log "inconsistent size" warnings at WARN level for
> non-under-construction blocks.
>     - Redundant addStoredBlock calls are also not worthy of WARN level
>     - Add some extra information to a warning in ReplicationTargetChooser
>    Reason: Improves diagnosis of error cases and clarity of code
>
>
>    HDFS-1242. Add unit test for the appendFile race condition /
> synchronization bug fixed in HDFS-142
>
>    Reason: Test coverage for previously applied patch.
>
>    HDFS-1218. Replicas that are recovered during DN startup should
> not be allowed to truncate better replicas.
>
>    Description: If a datanode loses power and then recovers, its replicas
>                 may be truncated due to the recovery of the local FS
>                 journal. This patch ensures that a replica truncated by
>                 a power loss does not truncate the block on HDFS.
>    Reason: Potential dataloss bug uncovered by power failure simulation
>
>    HDFS-915. Write pipeline hangs for too long when ResponseProcessor
> hits timeout
>
>    Description: Previously, the write pipeline would hang for the entire
> write
>                 timeout when it encountered a read timeout (eg due to a
>                 network connectivity issue). This patch interrupts the
> writing
>                 thread when a read error occurs.
>    Reason: Faster recovery from pipeline failure for HBase and other
>            interactive applications.
>
>
>    HDFS-1186. Writers should be interrupted when recovery is started,
> not when it's completed.
>
>    Description: When the write pipeline recovery process is initiated, this
>                 interrupts any concurrent writers to the block under
> recovery.
>                 This prevents a case where some edits may be lost if the
>                 writer has lost its lease but continues to write (eg due to
>                 a garbage collection pause)
>    Reason: Fixes a potential dataloss bug
>
>
> commit a960eea40dbd6a4e87072bdf73ac3b62e772f70a
> Author: Todd Lipcon <todd@lipcon.org>
> Date:   Sun Jun 13 23:02:38 2010 -0700
>
>    HDFS-1197. Received blocks should not be added to block map
> prematurely for under construction files
>
>    Description: Fixes a possible dataloss scenario when using append() on
>                 real-life clusters. Also augments unit tests to uncover
>                 similar bugs in the future by simulating latency when
>                 reporting blocks received by datanodes.
>    Reason: Append support dataloss bug
>    Author: Todd Lipcon
>
>
>    HDFS-1260. tryUpdateBlock should do validation before renaming meta file
>
>    Description: Solves bug where block became inaccessible in certain
> failure
>                 conditions (particularly network partitions). Observed
> under
>                 HBase workload at user site.
>    Reason: Potential loss of syunced data when write pipeline fails
>
>
> On Fri, Sep 2, 2011 at 11:20 AM, Suresh Srinivas <suresh@hortonworks.com>
> wrote:
> > I also propose following jiras, which are non append related bug fixes
> from
> > 0.20-append branch:
> >
> >   - HDFS-1164. TestHdfsProxy is failing.
> >   - HDFS-1211. Block receiver should not log "rewind" packets at INFO
> >   level.
> >   - HDFS-1118. Fix socketleak on DFSClient.
> >   - HDFS-1210. DFSClient should log exception when block recovery fails.
> >   - HDFS-606. Fix ConcurrentModificationException in
> >   invalidateCorruptReplicas.
> >   - HDFS-561. Fix write pipeline READ_TIMEOUT.
> >   - HDFS-1202.  DataBlockScanner throws NPE when updated before
> >   initialized.
> >
> > Risk Level:
> > These are useful bugfixes from append branch and are not big changes to
> the
> > code base.
> >
> > These jiras have already been merged into 0.20-security branch.
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message