hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liang Xie (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-5776) Support 'hedged' reads in DFSClient
Date Mon, 20 Jan 2014 09:46:21 GMT

     [ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Liang Xie updated HDFS-5776:

    Attachment: HDFS-5776-v3.txt

Thanks [~saint.ack@gmail.com] for your so detailed comments !
Attached v3 addressed the naming related comments firstly.
And i have got some perf number and would like to share here:

Test Env:
Hadoop 2.0 + HBase 0.94.11
3 datanodes and each DN has only one disk for dfs read/write(yes, only one SATA disk, it's
a little poor, haha, but very perfect for current test scenario, since we do want to see the
result while bad pread performance occurs)
one regionserver instance is up and created ycsb test table, loaded 20m records, each row
has 3 * 200 bytes, and finally did a major compaction, the webui showed only 1 storefile with
I use single process ycsb with 10 threads running to do the random read(get) request,  each
run 10 minutes, and i do clear the hbase block cache and os cache(drop_caches) manually between
each testing. the hedged reads thread pool size keeps 50. Here is the detailed result:

1) dfs.dfsclient.hedged.read.threshold.millis = 500ms, dfs.dfsclient.hedged.read.sleep.interval.millis
= 50ms, in deed, it should be very like the current existing impl since per the following
result, almost all of response time are less than 500ms, so just very very a few requests
probably go to the secondary DN:
Throughput(ops/sec), 221.8174849820451
AverageLatency(us), 45055.13540070315
50thPercentileLatency(us), 24049
95thPercentileLatency(us), 165905
99thPercentileLatency(us), 270578

2) dfs.dfsclient.hedged.read.threshold.millis = 150ms, dfs.dfsclient.hedged.read.sleep.interval.millis
= 50ms
Throughput(ops/sec), 257.6483818568037
AverageLatency(us), 38781.92033469773
50thPercentileLatency(us), 20534
95thPercentileLatency(us), 148194
99thPercentileLatency(us), 201110

3) dfs.dfsclient.hedged.read.threshold.millis = 100ms, dfs.dfsclient.hedged.read.sleep.interval.millis
= 50ms
Throughput(ops/sec), 254.35882053973887
AverageLatency(us), 39291.54205264606
50thPercentileLatency(us), 20585
95thPercentileLatency(us), 150998
99thPercentileLatency(us), 151446

4) dfs.dfsclient.hedged.read.threshold.millis = 100ms, dfs.dfsclient.hedged.read.sleep.interval.millis
= 20ms
Throughput(ops/sec), 237.20809410260168
AverageLatency(us), 42110.37126189875
50thPercentileLatency(us), 20246
95thPercentileLatency(us), 121147
99thPercentileLatency(us), 141207

In summary, in my heavy io-bound random read test scenario, the 99th percentile latency was
cut off from 270ms to 141ms via hedged read feature, but it doesn't helpful to improve the
avg latency or throughput obviously, this is expected, the biggest benefit is against the
long-tail random read latency issue, which is pretty common in HBase.

> Support 'hedged' reads in DFSClient
> -----------------------------------
>                 Key: HDFS-5776
>                 URL: https://issues.apache.org/jira/browse/HDFS-5776
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.0.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>         Attachments: HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776.txt
> This is a placeholder of hdfs related stuff backport from https://issues.apache.org/jira/browse/HBASE-7509
> The quorum read ability should be helpful especially to optimize read outliers
> we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & "dfs.dfsclient.quorum.read.threadpool.size"
to enable/disable the hedged read ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics,
we could export the interested metric valus into client system(e.g. HBase's regionserver metric).
> The core logic is in pread code path, we decide to goto the original fetchBlockByteRange
or the new introduced fetchBlockByteRangeSpeculative per the above config items.

This message was sent by Atlassian JIRA

View raw message