hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bob Hansen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-8696) Reduce the variances of latency of WebHDFS
Date Wed, 09 Sep 2015 19:09:47 GMT

     [ https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Bob Hansen updated HDFS-8696:
-----------------------------
    Description: 
There is an issue that appears related to the webhdfs server. When making two concurrent requests,
the DN will sometimes pause for extended periods (I've seen 1-300 seconds), killing performance
and dropping connections. 

To reproduce: 
1. set up a HDFS cluster
2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
the time out to /tmp/times.txt
{noformat}
i=1
while (true); do 
echo $i
let i++
/usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null "http://<namenode>:50070/webhdfs/v1/tmp/bigfile?op=OPEN&user.name=root&length=1";
done
{noformat}

3. Watch for 1-byte requests that take more than one second:
tail -F /tmp/times.txt | grep -E "^[^0]"

4. After it has had a chance to warm up, start doing large transfers from
another shell:
{noformat}
i=1
while (true); do 
echo $i
let i++
/usr/bin/time -f %e curl -s -L -o /dev/null "http://<namenode>:50070/webhdfs/v1/tmp/bigfile?op=OPEN&user.name=root";
done
{noformat}

It's easy to find after a minute or two that small reads will sometimes
pause for 1-300 seconds. In some extreme cases, it appears that the
transfers timeout and the DN drops the connection.

  was:
There is an issue that appears related to the webhdfs server. When making two concurrent requests,
the DN will sometimes pause for extended periods (I've seen 1-300 seconds), killing performance
and dropping connections. 

To reproduce: 
1. set up a HDFS cluster
2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
the time out to /tmp/times.txt
{noformat}
i=1
while (true); do 
echo $i
let i++
/usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null "http://<namenode>:50070/webhdfs/v1/tmp/bigfile?op=OPEN&user.name=root&length=1";
done
{noformat}

3. Watch for 1-byte requests that take more than one second:
tail -F /tmp/times.txt | grep -E "^[^0]"

4. After it has had a chance to warm up, start doing large transfers from
another shell:
{noformat}
i=1
while (true); do 
echo $i
let i++
(/usr/bin/time -f %e curl -s -L -o /dev/null "http://<namenode>:50070/webhdfs/v1/tmp/bigfile?op=OPEN&user.name=root");
done
{noformat}

It's easy to find after a minute or two that small reads will sometimes
pause for 1-300 seconds. In some extreme cases, it appears that the
transfers timeout and the DN drops the connection.


> Reduce the variances of latency of WebHDFS
> ------------------------------------------
>
>                 Key: HDFS-8696
>                 URL: https://issues.apache.org/jira/browse/HDFS-8696
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: webhdfs
>    Affects Versions: 2.7.0
>            Reporter: Xiaobing Zhou
>            Assignee: Xiaobing Zhou
>         Attachments: HDFS-8696.1.patch, HDFS-8696.2.patch, HDFS-8696.3.patch
>
>
> There is an issue that appears related to the webhdfs server. When making two concurrent
requests, the DN will sometimes pause for extended periods (I've seen 1-300 seconds), killing
performance and dropping connections. 
> To reproduce: 
> 1. set up a HDFS cluster
> 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
> the time out to /tmp/times.txt
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null "http://<namenode>:50070/webhdfs/v1/tmp/bigfile?op=OPEN&user.name=root&length=1";
> done
> {noformat}
> 3. Watch for 1-byte requests that take more than one second:
> tail -F /tmp/times.txt | grep -E "^[^0]"
> 4. After it has had a chance to warm up, start doing large transfers from
> another shell:
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e curl -s -L -o /dev/null "http://<namenode>:50070/webhdfs/v1/tmp/bigfile?op=OPEN&user.name=root";
> done
> {noformat}
> It's easy to find after a minute or two that small reads will sometimes
> pause for 1-300 seconds. In some extreme cases, it appears that the
> transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message