hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDDS-1771) Add slow IO disk test to fault injection test
Date Wed, 17 Jul 2019 15:47:00 GMT

    [ https://issues.apache.org/jira/browse/HDDS-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887186#comment-16887186

Eric Yang commented on HDDS-1771:

{quote}With an always slow disk the scm can't be started therefore there couldn't be any in-flight
connections. {quote}

Not true.  Even with a slow disk, it is possible to start scm.  In the case where disk IO
is barely enough, scm can start, and writing data to disk buffers (application side cache),
it only started to degrade after some IO operations.  In this case, IOException may be throw
when detecting scm disk is the bottleneck.

{quote}It's not a ready to use test, I can't schedule it to run every night.{quote}

Not true, try to create a maven job and run:

mvn -f pom.ozone.xml clean verify -Dmaven.javadoc.skip=true -Pit,docker-build,dist

This command works, when HDDS-1554 and HDDS-1771 patches are both applied.

{quote}I think the real question (at least for me) is that how the intermittent/random read/write
failures/slowness are handled, but this approach can't test these questions.{quote}

Base on our meeting of not conflating separate issues between slow disk and intermittent failures.
 We have a separate ticket HDDS-1773 for intermittent failure.  This was based on your feedback
of not conflating separate issues.  Do you wish to combine both tickets now or continue to
discuss them separately?

> Add slow IO disk test to fault injection test
> ---------------------------------------------
>                 Key: HDDS-1771
>                 URL: https://issues.apache.org/jira/browse/HDDS-1771
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>            Reporter: Eric Yang
>            Priority: Major
>         Attachments: HDDS-1771.001.patch, HDDS-1771.002.patch, HDDS-1771.003.patch
> In fault injection test, one possible simulation is to create slow disk IO.  This test
can assist in developing a set of timing profiles that works for Ozone cluster.  When we write
to a file, the data travels across a bunch of buffers and caches before it is effectively
written to the disk.  By controlling cgroup blkio rate in Linux Kernel, we can simulate slow
disk read, write.  Docker provides the following parameters to control cgroup:
> {code}
> --device-read-bps=""
> --device-write-bps=""
> --device-read-iops=""
> --device-write-iops=""
> {code}
> The test will be added to read/write test with docker compose file as parameters to test
the timing profiles.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message