hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Douglas <cdoug...@apache.org>
Subject Re: [DISCUSS] HADOOP-9122 Add power mock library for writing better unit tests
Date Mon, 02 Oct 2017 22:09:03 GMT
Eric/Steve-

Please pick a test- any test- and demonstrate why Powermock would
improve- by any metric- testing in Hadoop. -C



On Mon, Oct 2, 2017 at 2:12 PM, Eric Yang <eyang@hortonworks.com> wrote:
> Mock provides tool chains to run simulation for a piece of code.  It helps to prevent
null pointer exception, and reduce unexpected runtime exceptions.  When a piece of code is
finished with a well-defined unit test, it provides great insights to see author’s intention
and reasoning to write the code.  However, everyone looks at code from a different perspective,
and it is often easier to rewrite the code than modifying and update the tests.   The short
coming of writing new code, there is always danger of losing existing purpose, workaround
buried deep in the code.  On the other hand, if a test program is filling with several pages
of initialization code, and override.  It is hard to get context of the test case, and easy
to lose the original meaning of the test case.  Hence, there are drawback for using mock or
full integration test.
>
> I was in favor of using Powermock in favor of giving user the ability to unit test a
class and reduce external interference initially.  However, I quickly come to realization
that Hadoop usage of protocol buffer serialization technique and java reflection serialization
technique have some difference which prevents powermock to work for certain Hadoop classes.
>
> Hadoop unit tests are written to be bigger than one class, and frequently, a mini-cluster
is spawned to test 5-10 lines of code.  Any simple API test will trigger large portion of
Hadoop code to be initialized.  Hadoop code base will require too much effort to work with
Powermock.  Programs outside of Hadoop can use powermock annotation to prevent mocking Hadoop
classes, such as: @powermockignore({"javax.management_", "javax.xml.", "org.w3c.", "org.apache.hadoop._",
"com.sun.*"}) .  However, working in Hadoop code base, this technique is not practical because
every class in Hadoop prefix with org.apache.hadoop.  It will be heavy upkeep to maintain
the list of prefix packages that can not work with powermock reflection.
> Hence, I rest my case for re-opening this issue.
>
> Regards,
> Eric
>
> From: Steve Loughran <stevel@hortonworks.com>
> Date: Sunday, October 1, 2017 at 12:36 PM
> To: Eric Yang <eyang@hortonworks.com>
> Cc: Andrew Wang <andrew.wang@cloudera.com>, Chris Douglas <cdouglas@apache.org>,
"common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org>
> Subject: Re: [DISCUSS] HADOOP-9122 Add power mock library for writing better unit tests
>
>
> On 29 Sep 2017, at 22:46, Eric Yang <eyang@hortonworks.com<mailto:eyang@hortonworks.com>>
wrote:
>
> Hi Chris and Andrew,
>
> The intend is for new code to have better unit test cases without resort to invocation
of miniHDFSCluster or miniYarnCluster.  Existing code don’t require refactoring, if the
test cases already have good coverages.  I am currently working on part of YARN to improve
YARN and Docker integration.  There are a lot of code getting triggered for UGI, FileSystem
object to Yarn job submission.  My code is only responsible to check the logic of the user
input, and expected output prior to YarnClient job submission.  Starting a miniCluster for
this test case is excessive for the small piece of code for validation.  The submission code
was imported from Slider for YARN native services, a single class imports various Hadoop services.
 In several failure cases, it is difficult to simulate exact error conditions because the
API is several layers deep.  Powermock provides easy way to replace and stubbing return object
or throw proper exception to simulate the failure conditions.  One can argue that the code
should have been written easier for unit tests, but Hadoop code density is beyond trivial
to get simple initialization done.  Constructor suppression, inner class replacement and private
method override are good tools from Powermock that can provide more accurate testing without
losing sights of multiple stage API calling tests while keeping the test case localized to
a small piece of the greater puzzle.  Hence, I like to request the community to rethink the
improvement that Powermock can bring to the table.  Thank you for your considation.
>
> I don't know enough about powermock to have opinions on the matter. I do know I don't
like mocking in general https://www.slideshare.net/steve_l/i-hate-mocking , or at least in
the one area where I find it most troublesome: maintaining code
>
>
> I' just find that mock code tests to be very brittle to changes in the codepaths of the
classes called, so whenever you change the implementation, tests fail. And it's not so much
"your code has regressed and we correctly caught it"  failure as "the change in order of invocation
caused our test to report a regression when it wasn't really" kind of failure. Which is bad,
as you waste time working out that this is the cause, then often fix the problems by moving
bits of the test around until it stops failing. Which can hide real regressions.
>
> Where mocking can be good is in that
>
> 1. you can make assertions about how thinga were invoked, though note we've moved in
S3A towards actually instrumenting the code and asserting on that. This way our shipping code
gets to enjoy better instrumentation. [Note, those assertions can be brittle to changes in
implementation too]
>
> 2. You can simulate failure better. But for S3Guard/S3A we've gone and implemented an
InconsistentS3Client which can be used downstream (it ships in the hadoop-aws JAR) and so
can be used downstream.
>
> 3. You can test things without needing so much support infra (e.g. in unit tests and
on jenkins without needing logins, running services)
>
> 4. You can have faster tests, because there's no need to set up/tear down things like
HDFS
>
> 5. You can isolate problems to the code under test, rather than looking at the logs of
forked processes collected somewhere under target/
>
> I think Eric's looking @ #4, & 5 which, for tests which need a MiniYARN cluster is
significant. If Powermock helps this, I don't see why we should say "don't use it", as long
as we are aware of the cost, which is the risk of creating tests which are brittle to changes
in the implementation code
>
>
> FWIW, Mocking is why I couldn't make the init/start/stop methods of org.apache.hadoop.service.AbstractService
final; the need to test with mocking can impact production code. Is that bad? Well, we do
other things to code to aid testability,...
>
>
> -Steve
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org


Mime
View raw message