hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1846) Don't fill preallocated portion of edits log with 0x00
Date Sun, 24 Apr 2011 04:52:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023631#comment-13023631
] 

Aaron T. Myers commented on HDFS-1846:
--------------------------------------

Hey guys, I've done some performance analysis, and here are the results. I'll post a patch
shortly (not intended for inclusion) so you can see what I did to do the analysis. If anyone
would like to try this patch on their own system, I'd be very curious to see the results,
since as Nathan points out, the results can be affected by many factors.

{noformat}
----------------------------------------------------
Results for classic scheme:
Overall total ops: 100000 
Overall total time of all ops: 39224.0
Overall average time of op: 0.39224
Overall fastest op: 0
Overall slowest op: 223 
Preallocation total ops: 23
Preallocation total time of all ops: 24.0
Preallocation average time of op: 1.0434782608695652
Preallocation fastest op: 0
Preallocation slowest op: 6
Total time of slowest 1% of ops: 4858.0
Average time of slowest 1% of ops: 4.858
----------------------------------------------------
----------------------------------------------------
Results for new scheme: 
Overall total ops: 100000
Overall total time of all ops: 37192.0
Overall average time of op: 0.37192
Overall fastest op: 0 
Overall slowest op: 231
Preallocation total ops: 23 
Preallocation total time of all ops: 291.0 
Preallocation average time of op: 12.652173913043478
Preallocation fastest op: 10 
Preallocation slowest op: 21
Total time of slowest 1% of ops: 4670.0
Average time of slowest 1% of ops: 4.67
----------------------------------------------------
{noformat}

I personally ran this test several times on my own system, and the results from this particular
test run are pretty representative. There wasn't much variation across runs.

As you can see from this data, with the new scheme, performing an edit which causes an on-disk
preallocation is indeed slower - about 10x slower than a similar op using the previous scheme.
However, I was correct that the time taken for the average op is indeed lower with the new
scheme than the old. Also worth noting that the average time taken for the slowest 1% of ops
is faster with the new scheme, since there were only 23 preallocations during the test run.

I'm of the opinion that the increased latency of the preallocation-inducing ops is worth the
performance improvement of the average op and the extra durability this patch would provide.
The worst increase in latency from an op which happens to induce a preallocation is ~20ms,
which seems acceptable.

Also, curiously, in the course of this analysis I discovered that under both preallocation
schemes there are fairly consistently ~10 ops whose total time taken was ~200ms on my system.
These ops seem uncorrelated with preallocations. Determining what's causing those is being
left as future work.

> Don't fill preallocated portion of edits log with 0x00
> ------------------------------------------------------
>
>                 Key: HDFS-1846
>                 URL: https://issues.apache.org/jira/browse/HDFS-1846
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: hdfs-1846.0.txt
>
>
> HADOOP-2330 added a feature to preallocate space in the local file system for the NN
transaction log. That change seeks past the current end of the file and writes out some data,
which on most systems results in the intervening data in the file being filled with zeros.
Most underlying file systems have special handling for sparse files, and don't actually allocate
blocks on disk for blocks of a file which consist completely of 0x00.
> I've seen cases in the wild where the volume an edits dir is on fills up, resulting in
a partial final transaction being written out to disk. If you examine the bytes of this (now
corrupt) edits file, you'll see the partial final transaction followed by a lot of zeros,
suggesting that the preallocation previously succeeded before the volume ran out of space.
If we fill the preallocated space with something other than zeros, we'd likely see the failure
at preallocation time, rather than transaction-writing time, and so cause the NN to crash
earlier, without a partial transaction being written out.
> I also hypothesize that filling the preallocated space in the edits log with something
other than 0x00 will result in a performance improvement in NN throughput. I haven't tested
this yet, but I intend to as part of this JIRA.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message