hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eli Collins <...@cloudera.com>
Subject Re: Add Append-HBase support in upcoming 20.205
Date Fri, 02 Sep 2011 21:48:04 GMT
Hey Matt,

You can see the full change log here:
http://archive.cloudera.com/cdh/3/hadoop-0.20.2+923.97.CHANGES.txt

Most changes done for HBase have it listed in the "Reason" field.
There's a directory in the source tarball that contains all the
individual patches broken out.

Cheers,
Eli

On Fri, Sep 2, 2011 at 2:34 PM, Matt Foley <mfoley@hortonworks.com> wrote:
> Hi Todd,
> Thank you, this is tremendously valuable input!  I'll have to look in detail
> at each of these ten jiras,
> and will get back to the list with more info shortly.
> --Matt
>
> On Fri, Sep 2, 2011 at 1:03 PM, Todd Lipcon <todd@cloudera.com> wrote:
>
>> The following other JIRAs have been committed in CDH for 18 months or
>> so, for the purpose of HBase. You may want to consider backporting
>> them as well - many were never committed to 0.20-append due to lack of
>> reviews by HDFS committers at the time.
>>
>>    HDFS-1056. Fix possible multinode deadlocks during block recovery
>> when using ephemeral dataxceiv
>>
>>    Description: Fixes the logic by which datanodes identify local RPC
>> targets
>>                 during block recovery for the case when the datanode
>>                 is configured with an ephemeral data transceiver port.
>>    Reason: Potential internode deadlock for clusters using ephemeral ports
>>
>>
>>    HADOOP-6722. Workaround a TCP spec quirk by not allowing
>> NetUtils.connect to connect to itself
>>
>>    Description: TCP's ephemeral port assignment results in the possibility
>>                 that a client can connect back to its own outgoing socket,
>>                 resulting in failed RPCs or datanode transfers.
>>    Reason: Fixes intermittent errors in cluster testing with ephemeral
>>            IPC/transceiver ports on datanodes.
>>
>>    HDFS-1122. Don't allow client verification to prematurely add
>> inprogress blocks to DataBlockScanner
>>
>>    Description: When a client reads a block that is also open for writing,
>>                 it should not add it to the datanode block scanner.
>>                 If it does, the block scanner can incorrectly mark the
>>                 block as corrupt, causing data loss.
>>    Reason: Potential dataloss with concurrent writer-reader case.
>>
>>    HDFS-1248. Miscellaneous cleanup and improvements on 0.20 append branch
>>
>>    Description: Miscellaneous code cleanup and logging changes, including:
>>     - Slight cleanup to recoverFile() function in TestFileAppend4
>>     - Improve error messages on OP_READ_BLOCK
>>     - Some comment cleanup in FSNamesystem
>>     - Remove toInodeUnderConstruction (was not used)
>>     - Add some checks for null blocks in FSNamesystem to avoid a possible
>> NPE
>>     - Only log "inconsistent size" warnings at WARN level for
>> non-under-construction blocks.
>>     - Redundant addStoredBlock calls are also not worthy of WARN level
>>     - Add some extra information to a warning in ReplicationTargetChooser
>>    Reason: Improves diagnosis of error cases and clarity of code
>>
>>
>>    HDFS-1242. Add unit test for the appendFile race condition /
>> synchronization bug fixed in HDFS-142
>>
>>    Reason: Test coverage for previously applied patch.
>>
>>    HDFS-1218. Replicas that are recovered during DN startup should
>> not be allowed to truncate better replicas.
>>
>>    Description: If a datanode loses power and then recovers, its replicas
>>                 may be truncated due to the recovery of the local FS
>>                 journal. This patch ensures that a replica truncated by
>>                 a power loss does not truncate the block on HDFS.
>>    Reason: Potential dataloss bug uncovered by power failure simulation
>>
>>    HDFS-915. Write pipeline hangs for too long when ResponseProcessor
>> hits timeout
>>
>>    Description: Previously, the write pipeline would hang for the entire
>> write
>>                 timeout when it encountered a read timeout (eg due to a
>>                 network connectivity issue). This patch interrupts the
>> writing
>>                 thread when a read error occurs.
>>    Reason: Faster recovery from pipeline failure for HBase and other
>>            interactive applications.
>>
>>
>>    HDFS-1186. Writers should be interrupted when recovery is started,
>> not when it's completed.
>>
>>    Description: When the write pipeline recovery process is initiated, this
>>                 interrupts any concurrent writers to the block under
>> recovery.
>>                 This prevents a case where some edits may be lost if the
>>                 writer has lost its lease but continues to write (eg due
to
>>                 a garbage collection pause)
>>    Reason: Fixes a potential dataloss bug
>>
>>
>> commit a960eea40dbd6a4e87072bdf73ac3b62e772f70a
>> Author: Todd Lipcon <todd@lipcon.org>
>> Date:   Sun Jun 13 23:02:38 2010 -0700
>>
>>    HDFS-1197. Received blocks should not be added to block map
>> prematurely for under construction files
>>
>>    Description: Fixes a possible dataloss scenario when using append() on
>>                 real-life clusters. Also augments unit tests to uncover
>>                 similar bugs in the future by simulating latency when
>>                 reporting blocks received by datanodes.
>>    Reason: Append support dataloss bug
>>    Author: Todd Lipcon
>>
>>
>>    HDFS-1260. tryUpdateBlock should do validation before renaming meta file
>>
>>    Description: Solves bug where block became inaccessible in certain
>> failure
>>                 conditions (particularly network partitions). Observed
>> under
>>                 HBase workload at user site.
>>    Reason: Potential loss of syunced data when write pipeline fails
>>
>>
>> On Fri, Sep 2, 2011 at 11:20 AM, Suresh Srinivas <suresh@hortonworks.com>
>> wrote:
>> > I also propose following jiras, which are non append related bug fixes
>> from
>> > 0.20-append branch:
>> >
>> >   - HDFS-1164. TestHdfsProxy is failing.
>> >   - HDFS-1211. Block receiver should not log "rewind" packets at INFO
>> >   level.
>> >   - HDFS-1118. Fix socketleak on DFSClient.
>> >   - HDFS-1210. DFSClient should log exception when block recovery fails.
>> >   - HDFS-606. Fix ConcurrentModificationException in
>> >   invalidateCorruptReplicas.
>> >   - HDFS-561. Fix write pipeline READ_TIMEOUT.
>> >   - HDFS-1202.  DataBlockScanner throws NPE when updated before
>> >   initialized.
>> >
>> > Risk Level:
>> > These are useful bugfixes from append branch and are not big changes to
>> the
>> > code base.
>> >
>> > These jiras have already been merged into 0.20-security branch.
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>

Mime
View raw message