hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiaowei Zhu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9890) libhdfs++: Add test suite to simulate network issues
Date Mon, 11 Jul 2016 16:43:11 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371124#comment-15371124
] 

Xiaowei Zhu commented on HDFS-9890:
-----------------------------------

I found the root cause of the non-deterministic failures in our unit tests. Our patch with
the changes in filesystem.cc changes number of threads from 1 to 2, in FileSystemImpl::FileSystemImpl(...),
which causes those failures. I verified with the latest HDFS-8707 and reproduced the same
issue when I increased the number of threads. This change was introduced with the original
000.patch and is not so related to what this jira is about. So I plan to change the thread
value back to 1 and file another jira about this found issue.

> libhdfs++: Add test suite to simulate network issues
> ----------------------------------------------------
>
>                 Key: HDFS-9890
>                 URL: https://issues.apache.org/jira/browse/HDFS-9890
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: James Clampffer
>            Assignee: Xiaowei Zhu
>         Attachments: HDFS-9890.HDFS-8707.000.patch, HDFS-9890.HDFS-8707.001.patch, HDFS-9890.HDFS-8707.002.patch,
HDFS-9890.HDFS-8707.003.patch, HDFS-9890.HDFS-8707.004.patch, HDFS-9890.HDFS-8707.005.patch,
HDFS-9890.HDFS-8707.006.patch, HDFS-9890.HDFS-8707.007.patch, HDFS-9890.HDFS-8707.008.patch,
HDFS-9890.HDFS-8707.009.patch, HDFS-9890.HDFS-8707.010.patch, HDFS-9890.HDFS-8707.011.patch,
HDFS-9890.HDFS-8707.012.patch, HDFS-9890.HDFS-8707.012.patch, HDFS-9890.HDFS-8707.013.patch,
HDFS-9890.HDFS-8707.013.patch, HDFS-9890.HDFS-8707.014.patch, HDFS-9890.HDFS-8707.015.patch,
hs_err_pid26832.log, hs_err_pid4944.log
>
>
> I propose adding a test suite to simulate various network issues/failures in order to
get good test coverage on some of the retry paths that aren't easy to hit in mock unit tests.
> At the moment the only things that hit the retry paths are the gmock unit tests.  The
gmock are only as good as their mock implementations which do a great job of simulating protocol
correctness but not more complex interactions.  They also can't really simulate the types
of lock contention and subtle memory stomps that show up while doing hundreds or thousands
of concurrent reads.   We should add a new minidfscluster test that focuses on heavy read/seek
load and then randomly convert error codes returned by network functions into errors.
> List of things to simulate(while heavily loaded), roughly in order of how badly I think
they need to be tested at the moment:
> -Rpc connection disconnect
> -Rpc connection slowed down enough to cause a timeout and trigger retry
> -DN connection disconnect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message