hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-3023) Optimize entries in edits log for persistBlocks calls
Date Tue, 28 Feb 2012 05:18:49 GMT

     [ https://issues.apache.org/jira/browse/HDFS-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Todd Lipcon updated HDFS-3023:

    Attachment: hdfs-3023-HDFS-1623.txt

Attached patch implements optimizations 1-3 above. I didn't do optimization #4 since it's
a bit more complicated and will only really help with files that are several blocks long.
I'd like to leave it for future work.

The patch is against the HA branch, since the branch is soon to be merged and the code around
OP_ADD, etc, differs a bit. Rather than do it twice, I figured I'd just work on HA branch.

I'll do a round of benchmarks on this patch tomorrow.
> Optimize entries in edits log for persistBlocks calls
> -----------------------------------------------------
>                 Key: HDFS-3023
>                 URL: https://issues.apache.org/jira/browse/HDFS-3023
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node, performance
>    Affects Versions: HA branch (HDFS-1623), 0.23.2
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-3023-HDFS-1623.txt
> One of the performance issues noticed in the HA branch is due to the much larger edit
logs, now that we are writing OP_ADD transactions to the edit log on every block allocation.
We can condense these calls down in two ways:
> 1) use variable-length integers for the block list length, size, and genstamp (most of
these end up fitting in far less than 8 bytes)
> 2) use delta-coding for the genstamp and block size for any blocks after the first block
(most blocks will be the same size and only slightly higher genstamps)
> 3) introduce a new OP_UPDATE_BLOCKS transaction that doesn't re-serialize metadata information
like lease owner, permissions, etc
> 4) allow OP_UPDATE_BLOCKS to only re-serialize the blocks that have changed for a given

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message