drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hanifi Gunes <hgu...@maprtech.com>
Subject Re: [DISCUSS] Making the drill codebase easier to unit test
Date Wed, 17 Jun 2015 21:22:53 GMT
Some sub-systems that I know of, particularly around readers, writers, VVs
and operators are not unit-testing friendly by design: First, they involve
much more logic than one could define as a unit. Second, it is relatively
tough if not impossible to control their behavior, mock or inject
dependencies because they are tightly coupled with other parts of the
system. I would propose starting off with very fundamental yet minor code
refactoring that aims to have self-contained, cohesive pieces abstracted
away so that we could get these unit-tested first. Applying this
idea iteratively should bring better test coverage. Then we can focus on
testing operators or other components that rely on these well tested units.
Either way I would prefer a piece-meal approach rather than trying to
unit-test an entire sub-system.

-Hanifi

On Wed, Jun 17, 2015 at 1:53 PM, Abdel Hakim Deneche <adeneche@maprtech.com>
wrote:

> I don't know much work this involves (it seems a lot!) but this would be
> really useful. Like you said, with the current model coming up with good
> unit tests can be really tricky especially when testing the edge cases, and
> the worst part is that any changes to how queries are planned or for
> example the size of the batches can make some tests useless.
>
> On Tue, Jun 16, 2015 at 12:38 PM, Jason Altekruse <
> altekrusejason@gmail.com>
> wrote:
>
> > Hello Drill devs,
> >
> > I would like to propose a proactive effort to make the Drill codebase
> > easier to unit test.
> > Many JIRAs have been created for bugs that should have been prevented by
> > better unit testing, and we are still fixing these kinds of bugs today as
> > they crop up. I have a few ideas, and I plan on creating JIRAs for
> specific
> > refactoring and test infrastructure improvements. Before I do, I would
> like
> > to collect thoughts from everyone on what can get us the most benefit for
> > our work.
> >
> > As a short overview of the situation today, most of the tests in Drill
> take
> > the form of running a SQL query on a local drillbit and verifying the
> > results. Plenty of times this has been described as more of integration
> > testing than unit testing, and it has caused several common testing pains
> > and gaps.
> >
> > 1. batch boundaries - as we cannot control where batches are cut off
> during
> > the query, complete queries often make it hard to test different
> scenarios
> > processing an incoming stream of data with given properties.
> >          - examples of issues: inconsistent behavior between operators,
> > some
> >            operators have failed to handle empty batches, or a batch full
> > of nulls
> >            until we wrote a test that happened to have the right input
> file
> > and plan to
> >            produce these scenarios
> > 2. Valid planning changes can end up making tests previously designed to
> > test execution fail in new ways as the data will now flow differently
> > through the operators
> > 3. SQL queries as test specifications make it hard to test "everything",
> > all types, all possible data properties/structures, all possible switches
> > flipped in the planner or configuration for an operator
> >
> > I would like to start the discussion with a proposal to fix some of these
> > problems. We need a way to run an operator easily in isolation. Possible
> > steps to achieve this include, a new operator that will produce data in
> > explicitly provided batches, that can be configured from a test. This can
> > serve as a universal input to unit test operators. We would also need
> some
> > way to consume and verify the output of the operators. This could share
> > code with the current query execution, or possibly side step it to avoid
> > having to mock or instantiate the whole query context.
> >
> > This proposal itself is testing a relatively large part of the system as
> a
> > whole "unit". I would be interested to hear opinions on the utility vs
> > extra effort of trying to refactor more classes so that they can be
> created
> > in tests and have their individual methods tested. This is already being
> > done for some classes like the value vectors, but it is far from
> > exhaustive. I don't expect us to start rigidly enforcing this level of
> > testing granularity everywhere, but there are components of the system
> that
> > really need to be resilient and be guaranteed to stay that way as the
> > project evolves.
> >
> > Please chime in with your thoughts.
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message