drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Altekruse <altekruseja...@gmail.com>
Subject Re: [DISCUSS] Making the drill codebase easier to unit test
Date Thu, 18 Jun 2015 19:16:39 GMT
I agree that code refactoring in necessary to make some components of the
project more testable. Do you have some ideas in particular about coupling
that is blocking this kind of testing today? I know that there are several
context objects like DrillbitContext, FragmentContext and QueryContext that
are relatively heavy and shred amongst a number of components.

Do you think that there are some cases that might be able to be fixed with
less refactoring and instead some test infrastructure enhancements that can
generate these contexts? This should be done in a generalized manner where
they can be grabbed for particular tests from some kind of static
initialization function, avoiding code duplication in the tests themselves.
I haven't tried to do a lot of this testing myself, but I have been under
the impression that this might solve some of our issues. If we have a few
small methods in the unit tests that work to create these objects rather
than try to mock subsets of them we might be able to get some of these
benefits without major core code refactoring.

On Wed, Jun 17, 2015 at 2:22 PM, Hanifi Gunes <hgunes@maprtech.com> wrote:

> Some sub-systems that I know of, particularly around readers, writers, VVs
> and operators are not unit-testing friendly by design: First, they involve
> much more logic than one could define as a unit. Second, it is relatively
> tough if not impossible to control their behavior, mock or inject
> dependencies because they are tightly coupled with other parts of the
> system. I would propose starting off with very fundamental yet minor code
> refactoring that aims to have self-contained, cohesive pieces abstracted
> away so that we could get these unit-tested first. Applying this
> idea iteratively should bring better test coverage. Then we can focus on
> testing operators or other components that rely on these well tested units.
> Either way I would prefer a piece-meal approach rather than trying to
> unit-test an entire sub-system.
>
> -Hanifi
>
> On Wed, Jun 17, 2015 at 1:53 PM, Abdel Hakim Deneche <
> adeneche@maprtech.com>
> wrote:
>
> > I don't know much work this involves (it seems a lot!) but this would be
> > really useful. Like you said, with the current model coming up with good
> > unit tests can be really tricky especially when testing the edge cases,
> and
> > the worst part is that any changes to how queries are planned or for
> > example the size of the batches can make some tests useless.
> >
> > On Tue, Jun 16, 2015 at 12:38 PM, Jason Altekruse <
> > altekrusejason@gmail.com>
> > wrote:
> >
> > > Hello Drill devs,
> > >
> > > I would like to propose a proactive effort to make the Drill codebase
> > > easier to unit test.
> > > Many JIRAs have been created for bugs that should have been prevented
> by
> > > better unit testing, and we are still fixing these kinds of bugs today
> as
> > > they crop up. I have a few ideas, and I plan on creating JIRAs for
> > specific
> > > refactoring and test infrastructure improvements. Before I do, I would
> > like
> > > to collect thoughts from everyone on what can get us the most benefit
> for
> > > our work.
> > >
> > > As a short overview of the situation today, most of the tests in Drill
> > take
> > > the form of running a SQL query on a local drillbit and verifying the
> > > results. Plenty of times this has been described as more of integration
> > > testing than unit testing, and it has caused several common testing
> pains
> > > and gaps.
> > >
> > > 1. batch boundaries - as we cannot control where batches are cut off
> > during
> > > the query, complete queries often make it hard to test different
> > scenarios
> > > processing an incoming stream of data with given properties.
> > >          - examples of issues: inconsistent behavior between operators,
> > > some
> > >            operators have failed to handle empty batches, or a batch
> full
> > > of nulls
> > >            until we wrote a test that happened to have the right input
> > file
> > > and plan to
> > >            produce these scenarios
> > > 2. Valid planning changes can end up making tests previously designed
> to
> > > test execution fail in new ways as the data will now flow differently
> > > through the operators
> > > 3. SQL queries as test specifications make it hard to test
> "everything",
> > > all types, all possible data properties/structures, all possible
> switches
> > > flipped in the planner or configuration for an operator
> > >
> > > I would like to start the discussion with a proposal to fix some of
> these
> > > problems. We need a way to run an operator easily in isolation.
> Possible
> > > steps to achieve this include, a new operator that will produce data in
> > > explicitly provided batches, that can be configured from a test. This
> can
> > > serve as a universal input to unit test operators. We would also need
> > some
> > > way to consume and verify the output of the operators. This could share
> > > code with the current query execution, or possibly side step it to
> avoid
> > > having to mock or instantiate the whole query context.
> > >
> > > This proposal itself is testing a relatively large part of the system
> as
> > a
> > > whole "unit". I would be interested to hear opinions on the utility vs
> > > extra effort of trying to refactor more classes so that they can be
> > created
> > > in tests and have their individual methods tested. This is already
> being
> > > done for some classes like the value vectors, but it is far from
> > > exhaustive. I don't expect us to start rigidly enforcing this level of
> > > testing granularity everywhere, but there are components of the system
> > that
> > > really need to be resilient and be guaranteed to stay that way as the
> > > project evolves.
> > >
> > > Please chime in with your thoughts.
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message