hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elek, Marton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDDS-1458) Create a maven profile to run fault injection tests
Date Mon, 06 May 2019 09:17:00 GMT

    [ https://issues.apache.org/jira/browse/HDDS-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833629#comment-16833629
] 

Elek, Marton commented on HDDS-1458:
------------------------------------

Thanks the answers [~eyang].

I think our views are closer. At lease we can agree in some technical properties. The big
difference (IMHO) that we have different view about the importance of some issue/some usecases.
(The proposed solution introduces heavy limitation on the current usage patterns).

The other differences that I can see multiple, different type of usage of containers. Fix
me If I am wrong, but as far as I understood you would like to use containers with the same
way everywhere. I will try to explain this at the end of this comment.


bq. The docker process adds 32 seconds. It is 26% increase in build time, 

Yes, we agree that it's slower.

Without SSD it's even slower.

Yes, I think it's a problem this is the reason why I put it under the "Cons" list

No, we can't skip docker image creation as the pseudo-cluster creation should be available
for all of the time (as it's available as of now with the current solution). I think BOTH
the unit tests AND the integration tests should be checked ALL the time.

SUMMARY:

 * What we agree: build is significat slower
 * What we don't agree: I think we need to keep it simple and fast to execute the smoketest,
ALL the time. This is the supported by the proposed solution.

bq.  2. It's harder to test locally the patches. The reproducibility are decreased. (Instead
of the final build a local container is used).

bq. Not true. Docker can use a local container or use official image by supplying -DskipDocker
flag, and let docker compose yaml file decide to use local image or official released binary
for fault-injection-test maven project. If you want to apply patch to official released docker
image, then you are are already replacing binaries in docker image, it is no longer the official
released docker image. Therefore, what is wrong with using a local image that give you exactly
same version of everything that is described in Dockerfile of the official image?

Not exactly, I am talking about a differnt thing. Let's imagine that you start two build paralell.
How do you know which image is used to execute the tests? You can't be sure. There is no direct
relation ship between checked out source code, docker image which is created and the compose
files. (We need build specific tag names for that which are saved to somewhere).

With mounting the volumes (as we do it now) we can be sure and you can execute multiple smoketests
paralell.

 SUMMARY:

 * I think we are talking about different things
 * I think the paralell executing is borken by the proposed solution


bq. 3. It's harder to test a release package from the smoketest directory.

bq. Smoketest can converted to another submodule of ozone. The same -DskipDocker can run smoketest
with official build without using local image. I will spend some time on this part of the
project to make sure that I didn't break anything.

Please don't do it. As I wrote I would like to keep the possiblity to execute the smoketests
WITHOUT build. I think this is a very useful feature to 

 1. Test convenience binary package during the vote
 2. Smoketest different install (eg. kubernetes install)

Please check my last vote. I executed the smoketest for both the src package and the bin package
to be sure that both are good.

SUMMARY:

 * You think it's enough to execute smoketest during the build
 * I think it's very important to make it possible to run smoketests without a build from
any install (convenience binary, kubernetes, on-prem install, etc.)

bq. Security fixes can be applied later to the docker containers.

bq. The correct design is to swap out the docker container completely with a new version.
Patch and upgrade strategy should not be in-place modification of the docker container that
grows over time. Overlay file system will need to be committed to retain state. By running
container with in place binary replacement can lead to inconsistent state of docker container
when power failure happens.

Again, I am not sure that we are talking about the same problem. I would like to "swap out"
the old docker images completly. And because the first layers are updated it won't "grow"
over time.

But let's make the conversation more clean: let's talk about tags. If I understood well, in
case of a security issue you would like to drop (?) all the old images (eg. hadoop:2.9.0,
hadoop:3.1.0, hadoop:3.2.0, hadoop:3.2.1) and create a new one (hadoop:3.2.2). 

First of all, I don't know how would you like to drop the images (as of now you need to create
an INFRA ticket for that one). But there could be a lot of users of the old images. What would
you do with them? With dropping old images you would break a lot of users (it's exactly the
same as _deleting_ all the old hadoop releases in case of a security issue. But we don't do
it).

SUMMARY:

 * AFAIK you would like to support only the latest images
 * I have a lot of use cases when I need old images and I would like to give _limited_ support
(eg. update the underlaying op in the container eventually). 


bq. 5. It conflicts if more than one builds are executed on the same machine (docker images
are shared but volume mounts are separated). (Even just this one is a blocker problem for
me)

bq. Does this mean you have multiple source tree running build? Why not use multiple start-build-env.sh
on parallel source trees? I think it can provide the same multi-build process, but need to
test it out.

Yes, it means multiple builds.

Yes, I would like to support it even without start-build-env.sh

Yes, I would like to support it on Jenkins (where start-build-env is not used)

No, start-build-env doesn't solve the problem as you added the following lines:

{code}
+  -v "/var/run/docker.sock:/var/run/docker.sock" \
{code}

Which is added by Jenkins anyway so you can't fix it just with removing it from start-build-env.sh.

SUMMARY:

 * I have important use cases which are not supported by the proposed change.


bq. 6. Some tests can't be executed from the final tarball. (I would like to execute tests
from the released binary, as discussed earlier in the original jira).

bq .Same answer as before use -DskipDocker flag, or use mvn install final tarball, and run
docker build. Maybe I am missing something, please clarify.

I think I already (tried to) clarify it earlier. I would like to execute all the smoketests
(and blockade tests) from the distibution package. It may not be important for you but please
respect my which to support it. 
 
 * I would like to test the convenience release binaries during a vote (yes, I would like
to be sure that both the src package (the de facto release) and the convenience binary package
is fine.

 * I would like to execute the smoketests in different environment: for example on kubernetes
nodes to be sure that it's instaled well (I do it even now, and it's very useful).  

SUMMARY:

 * I have important use cases which are not supported by the proposed change.

bq. I have a hard time with Ozone stable environment. Hadoop-runner is a read only file system
without symlink support, and all binaries and data are mounted from external volume. What
benefit does this provide? 

I did my best to enumerate the benefits (see the cons list). Reproducable test execution,
parallel test execution, the ability to execute the tests with exactly the same bits.

bq. Everything is interacting with outside of the container environment. What containment
does this style of docker image provides? I only see process namespace isolation, network
interface simulation. But it lacks of ability to make docker container idempotent and reproducible
else where.

You are 100% right. Here (for testing) we use only namespace and network isolation and reproducable
runtime environment. I think what you are looking for is an other use case (what I called
(1) option in my previous comment). And I totally agree that we need to support BOTH.

SUMMARY:

 * You would like to use full portable and self-contained container for all the use cases.
 * I can see multiple style of the container usage and while I would like to support the style
what you prefer (see the k8s-dev profile and apache/ozone) image I also would like to use
containers for development and testing with a different 'style' to get better, faster and
more effective development experience.

bq. This can create problems that code only works in one node but can not reproduce else where.
I often see distributed software being developed on one laptop, and having trouble to run
in cluster of nodes. The root cause is the mounted volume is shared between containers, and
developers often forgot about this and wrote code that does local IO access. When it goes
to QA, nothing works in distributed nodes. It would be good to prevent ourselves from making
this type of mistakes by not sharing the same mount points between containers as base line.
I know that you may not have this problem with deeper understanding of distributed file system.
However, it is common mistake among junior system developers. Application programmers may
use shared volumes to exchange data between containers, but not the team that suppose to build
distributed file system. I would like to prevent this type of silliness from happening by
making sure that ozone processes don't exchange data via shared volumes.

Thanks this comment as it's a very technical concern, I understand it. If I understood well
you say that two component may write eventually the same file. I think there is a very low
chance to do this kind of problem. I have never seen it until now and our current code structure
(we write most of the data with Metadata interface) it very hard (especially as we have very
strong code review process).

But understand a risk and that's one reason that I would like to support to execute smoketest
on kubernetes. And yes, for kubernetes we need to create the real containers, exactly what
you would like to use (see, again, the k8s-dev profile).

In fact the local writable area of the components are part of the containers (/data or /tmp
usually) and not mounted. 

SUMMARY:

 * You have some concerns about the identification problems when we try to write the same
files from multiple components.
 * I think the chance to get some problems is low and we didn't see such problems until now.
Adding ":ro" flag to the local mount can solve this problem.

Let's talk about the testing. For testing usually we have multiple layers(sometimes called
it Test Pyramid https://martinfowler.com/bliki/TestPyramid.html). We may have unit tests,
integration tests, acceptance tests, etc.

I have a very similar view about the container usage. There are multiple layers of the container
usage. I think the first layer is to use containers for the network/disk layout isolation
and the next level is to use totally independent and portable containers. And I totally agree
with you in the benefit of the second level (portable containers) and I think it's very important
to support that level. In this level we have apache:ozone docker image (totally portable,
self container) and we have a tool to create similar containers for the dev images (k8s-dev
and k8s-dev-push profile).

The big difference in our views that I can see very huge benefit to use contains on the level
one where we use only the network/disk layout isolation. This is not a replacement of the
usage of full containers but an other level of the usage for make it easy to develop and test
(as you can see most of my comments are related to the development and testing process). 

I think it's a very important distinction. And the root cause of the conflict between our
views that you try to find the level2 container usage where we use only contains on level1
level (because it's more effective).

bq. I haven't seen a project in Dockerhub that use the same technique as option 2. Can you
show me some examples of similar projects?

Dockerhub usually used to store level2 images, but there are patterns where the contains are
used just as an environment. For example when a simple go docker image is used to create an
application (and not with multi level docker build.)

> Create a maven profile to run fault injection tests
> ---------------------------------------------------
>
>                 Key: HDDS-1458
>                 URL: https://issues.apache.org/jira/browse/HDDS-1458
>             Project: Hadoop Distributed Data Store
>          Issue Type: Test
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>            Priority: Major
>         Attachments: HDDS-1458.001.patch, HDDS-1458.002.patch, HDDS-1458.003.patch
>
>
> Some fault injection tests have been written using blockade.  It would be nice to have
ability to start docker compose and exercise the blockade test cases against Ozone docker
containers, and generate reports.  This is optional integration tests to catch race conditions
and fault tolerance defects. 
> We can introduce a profile with id: it (short for integration tests).  This will launch
docker compose via maven-exec-plugin and run blockade to simulate container failures and timeout.
> Usage command:
> {code}
> mvn clean verify -Pit
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message