hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3510) Fix FSEditLog pre-allocation
Date Tue, 19 Jun 2012 22:25:42 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397117#comment-13397117
] 

Suresh Srinivas commented on HDFS-3510:
---------------------------------------

Thanks for adding description of the solution.

bq. Preallocation was introduced by Dhruba Borthakur in HADOOP-2330 in 2008.
I know this quite well and hence was pointing out that the description of this jira needs
to be updated. BTW please address my earlier comment about updating the description of the
jira. Instead of saying preallocation is flawed, you may want to state that it can be improved
to avoid partial writes during disk becoming full. Please also update the current jira title
"Fix FSEditLog pre-allocation" to "Improve FSEditLog pre-allocation".

Some comments on the patch:
# Please retain the pre-allocation method in EditLogFileOutputStream. This functionality does
not belong in EditsDoubleBuffer.
# {{fc.position(fc.position() - 1); // skip back the end-of-file marker}} - is no longer required?
# Please add the debug logs back - it is useful for debugging.

Under heavy load/long disk i/o times, you could still see a large number of edits batched
and synced together. Some of the editlog operations such as Close operation can be quite large.
In that case, the preallocated 1 MB still may not be sufficient, though you have reduced most
of the partial write cases.

                
> Fix FSEditLog pre-allocation
> ----------------------------
>
>                 Key: HDFS-3510
>                 URL: https://issues.apache.org/jira/browse/HDFS-3510
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>             Fix For: 1.0.0, 2.0.1-alpha
>
>         Attachments: HDFS-3510-b1.001.patch, HDFS-3510-b1.002.patch, HDFS-3510.001.patch,
HDFS-3510.003.patch, HDFS-3510.004.patch, HDFS-3510.004.patch, HDFS-3510.006.patch, HDFS-3510.007.patch,
HDFS-3510.008.patch, HDFS-3510.009.patch
>
>
> In the FSEditLog, we want to avoid running out of space in the middle of writing an edit
log operation to the disk. We do this by a process called "preallocation"-- reserving space
on the disk for the upcoming edit log entries before beginning to write them.
> The idea is that if we're going to encounter an out-of-disk-space condition, we don't
want it to happen in the middle of writing valid data.  Instead, we want it to happen in the
middle of writing padding bytes.  The edit log uses bytes with the value 0xff (in decimal,
-1) as padding.  These bytes correspond to FSEditLogOp.OP_INVALID.
> The current preallocation strategy is flawed.  Although we preallocate a very large chunk
at a time-- 1 megabyte, in fact-- we only do this preallocation when we are more than 4096
bytes away from the end of the file.  This means that the effective preallocation length is
only 4096 bytes.  A batch of edit log entries could easily be more than this.  There is evidence
that this has caused problems in the field for end-users.
> Here is a visual illustration of the old preallocation strategy:
> {code}
> first write
> |
> V <----- 1 MB ----->
> +--+---------------+
> |__|FFFFFFFFFFFFFFF|
> +--+---------------+
>     second write
>     |
>     V
> +--+------+--------+
> |__|______|FFFFFFFF|
> +--+------+--------+
>            third write
>            |
>            V
> +--+------+------+-+
> |__|______|______|_|
> +--+------+------+-+
>                   fourth write
>                   | (NOT preallocated)
>                   V
> +--+------+------+-+
> |__|______|______|________
> +--+------+------+-+
>                           fifth write
>                           |
>                           V<--- 1 MB -->
> +--+------+------+--------+---+--------+
> |__|______|______|________|___|FFFFFFFF|
> +--+------+------+--------+---+--------+
> {code}
> And here is the new preallocation strategy:
> {code}
> first write
> |
> V <----- 1 MB ----->
> +--+---------------+
> |__|FFFFFFFFFFFFFFF|
> +--+---------------+
>     second write
>     |
>     V
> +--+------+--------+
> |__|______|FFFFFFFF|
> +--+------+--------+
>            third write
>            |
>            V
> +--+------+------+-+
> |__|______|______|_|
> +--+------+------+-+
>                   fourth write
>                   |
>                   V <------ 1MB-->
> +--+------+------+--------+------+
> |__|______|______|________|      |
> +--+------+------+--------+------+
>                           fifth write
>                           |
>                           V
> +--+------+------+--------+---+--+
> |__|______|______|________|___|  |
> +--+------+------+--------+---+--+
> {code}
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message