drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rahul challapalli <challapallira...@gmail.com>
Subject Re: [DISCUSS] Publishing advanced/functional tests
Date Tue, 04 Aug 2015 18:23:36 GMT
Thanks for your inputs.

Once issue with just publishing the tests in their current state is that,
the framework re-distributes tpch, tpcds, yelp data sets without requiring
the users to accept their relevant licenses. A good number of tests uses
these data sets. Any thoughts on how to handle this?

- Rahul

On Wed, Jul 29, 2015 at 12:07 AM, Ted Dunning <ted.dunning@gmail.com> wrote:

> +1.  Get it out there.
>
>
>
> On Tue, Jul 28, 2015 at 10:12 PM, Jacques Nadeau <jacques@dremio.com>
> wrote:
>
> > Hey Rahul,
> >
> > My suggestion would be to the lower bar--do the absolute bare minimum to
> > get the tests out there.  For example, simply remove proprietary
> > information and then get it on a public github (whether your personal
> > github or a corporate one).  From there, people can help by submitting
> pull
> > requests to improve the infrastructure and harness.  Making things easier
> > is something that can be done over time.  For example, we've had offers
> > from a couple different Linux Admins to help on something.  I'm sure that
> > they could help with a number of the items you've identified.  In the
> mean
> > time, we risk patches being merged that have less than complete testing.
> >
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Mon, Jul 27, 2015 at 2:16 PM, rahul challapalli <
> > challapallirahul@gmail.com> wrote:
> >
> > > Jacques,
> > >
> > > I am breaking down steps 1,2 & 3 into sub-tasks so we can
> add/prioritize
> > > these tasks
> > >
> > > Item #TaskSub-TaskCommentsPriority1*Publish the tests*
> > >
> > >
> > >
> > >
> > > Remove Proprietary Data & Queries
> > > 0
> > >
> > > Redact Propriety Data/Queries
> > >
> > >
> > >
> > > Move tests into drill repo
> > > This requires some refactoring to the framework code since the test
> > > framework uses a 2-level directory structure
> > >
> > >
> > >
> > > Organize the tests using a label based approach
> > > This involves code changes and moving a lot of files. When doing a one
> > time
> > > push it might be better to do this before publishing the tests?
> > >
> > >
> > > Each suite should be independentSome suites wrongly assume that the
> data
> > is
> > > present. They should be identified and fixed
> > >
> > >
> > > Cleanup hardcoded dependencies during data generationSome data-gen
> > scripts
> > > have hard-coded references
> > >
> > >
> > > Cleanup downloadsThe same dataset is being downloaded multiple times by
> > > different suites
> > >
> > >
> > > Licenses for downloadsThe framework downloads some files automatically.
> > > These files are publicly available.
> > > However before downloading them users need to agree to certain terms.
> By
> > > using the framework users might be skipping this step. We should look
> > into
> > > this
> > > 2*Setup a cluster infrastructure to run the pre-commit tests*
> > >
> > >
> > > 3*Local debugging of tests*
> > >
> > >
> > >
> > >
> > > Add an optional maven target for running tests on a local machine
> > > Tests can launch an embedded drillbit or they can connect to a running
> > > drillbit through zookeeper
> > >
> > >
> > > Running suites which require additional setup (hive, hbase etc) should
> be
> > > made optional
> > >
> > > 4*Documentation*
> > >
> > >
> > >
> > >
> > > Running Tests (options available and also listing the asumed defaults)
> > >
> > >
> > >
> > > Explaining how tests are organized
> > >
> > >
> > >
> > > Process for adding a new suite
> > >
> > >
> > >
> > > On Fri, Jul 24, 2015 at 1:40 PM, Jacques Nadeau <jacques@dremio.com>
> > > wrote:
> > >
> > > > Let's get number one done (tests out there so all community members
> can
> > > run
> > > > them).  Then the whole community can work together to solve the rest.
> > > >
> > > > I don't think the base install should include integration test
> > execution.
> > > > I do think the tests should be in the main repo (as opposed to a
> > > > secondary).
> > > >
> > > > We should strive to ultimately make running these integration tests a
> > > > requirement for merging.  We need to complete all the steps before we
> > can
> > > > impose that.  I should be able to help on the global run component
> and
> > > > supporting infrastructure.
> > > >
> > > > J
> > > >
> > > >
> > > >
> > > > --
> > > > Jacques Nadeau
> > > > CTO and Co-Founder, Dremio
> > > >
> > > > On Fri, Jul 24, 2015 at 1:29 PM, rahul challapalli <
> > > > challapallirahul@gmail.com> wrote:
> > > >
> > > > > Ramana,
> > > > >
> > > > > You are right. We are trying to address multiple issues here, but
> not
> > > > with
> > > > > a single solution. I am summarizing them
> > > > >
> > > > > 1. Tests should be visible to everyone (Implicit goal)
> > > > > 2. Before applying a patch we should run tests in a clustered
> > > > environment.
> > > > > Parth had a suggestion(#4) in his original email.
> > > > > 3. Developers should be able to debug majority of the tests on
> their
> > > > local
> > > > > environment. I made a few suggestions above to this regard
> > > > >
> > > > > - Rahul
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jul 24, 2015 at 10:40 AM, Ramana I N <inramana@gmail.com>
> > > wrote:
> > > > >
> > > > > > One important thing which we need to be clear on here is what
are
> > we
> > > > > trying
> > > > > > to address?
> > > > > >
> > > > > > I feel there are two separate issues here and I do not think
one
> > > > solution
> > > > > > will fit both the issues.
> > > > > >
> > > > > >    1. Allowing developers to run tests on their local box so
they
> > > know
> > > > > the
> > > > > >    changes they have are not completely wrong.
> > > > > >    2. Allowing transparency in the integration tests process
> which
> > is
> > > > > >    currently a black box.
> > > > > >
> > > > > > 1 is needed for developers to make changes and have an idea
that
> > > their
> > > > > > changes are not going to fail tests en masse in the integration
> > > suite.
> > > > 2
> > > > > is
> > > > > > needed because its a prerequisite for changes to be committed.
> > > > > >
> > > > > >
> > > > > > Regards
> > > > > > Ramana
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 24, 2015 at 10:28 AM, rahul challapalli <
> > > > > > challapallirahul@gmail.com> wrote:
> > > > > >
> > > > > > > Ramana,
> > > > > > >
> > > > > > > Let me fill in more details.
> > > > > > >
> > > > > > > 1. Before we accept a patch we want to make sure the tests
run
> > in a
> > > > > > cluster
> > > > > > > environment. No exceptions here.
> > > > > > > 2. We want  the contributors to be able to debug the failing
> > tests
> > > on
> > > > > > their
> > > > > > > laptops in as many cases as possbile. This requires :
> > > > > > >         1. Tests should run on top of a local file system.
> (Tests
> > > can
> > > > > > > launch an embedded drillbit or they can connect to a running
> > > drillbit
> > > > > > > through zookeeper)
> > > > > > >         2. Running suites which require additional setup
(hive,
> > > hbase
> > > > > > etc)
> > > > > > > should be made optional and sufficient documentation should
be
> > > > provided
> > > > > > for
> > > > > > > enabling and disabling these tests.
> > > > > > > 3. In my opinion making these new tests part of drill would
> make
> > it
> > > > > > easier
> > > > > > > for the developers to debug and run tests instead of having
a
> > > > different
> > > > > > > repository. But as you said it might bloat the drill project
> > > > > > >
> > > > > > > - Rahul
> > > > > > >
> > > > > > > On Fri, Jul 24, 2015 at 9:42 AM, Ted Dunning <
> > > ted.dunning@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > The Hadoop family of projects has some software that
> > integrates a
> > > > > > > > continuous integration system so that every time a
JIRA is
> > marked
> > > > as
> > > > > > > > patch-available, the associated patch attached to
the bug
> will
> > > have
> > > > > > > > integration tests run against it.  I believe that
there has
> > been
> > > > some
> > > > > > > > process to use git hashes instead of patches.  The
CI results
> > are
> > > > put
> > > > > > > back
> > > > > > > > on the JIRA.
> > > > > > > >
> > > > > > > > This is done using a fairly simple set of scripts.
 Apache
> > Yetus
> > > is
> > > > > > just
> > > > > > > > forming as a direct-to-top-level spinoff from Hadoop
> > > > > > > >
> > > > > > > > Proposal is here (don't be fooled by the fact that
it looks
> > like
> > > an
> > > > > > > > incubation proposal):
> > > > > > > >
> > > > > > > > http://wiki.apache.org/incubator/YetusProposal
> > > > > > > >
> > > > > > > > Early code can be found here (don't guess that this
is very
> > real
> > > > > yet).
> > > > > > > > More links can be found in the proposal.
> > > > > > > >
> > > > > > > >
> https://github.com/sekikn/pre-yetus/tree/master/precommit/docs
> > > > > > > >
> > > > > > > > The project has not yet been formed and there are
no mailing
> > > lists
> > > > or
> > > > > > git
> > > > > > > > repo yet.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Jul 24, 2015 at 9:25 AM, Ramana I N <
> > inramana@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > As someone who worked on this for a while, including
it as
> > part
> > > > of
> > > > > > > drill
> > > > > > > > > may bloat drill a bit too much. Also not a big
fan of
> running
> > > > > against
> > > > > > > an
> > > > > > > > > embedded drillbit. Does not replicate an actual
production
> > use
> > > > > case.
> > > > > > > > >
> > > > > > > > > Additionally, setting up hive hbase and other
components
> > maybe
> > > > > > painful
> > > > > > > > and
> > > > > > > > > unnecessary for most ppl. It would deter people
from ever
> > > > > > contributing
> > > > > > > to
> > > > > > > > > drill. We could spin up in memory hive and hbase
but that's
> > > > similar
> > > > > > to
> > > > > > > an
> > > > > > > > > embedded drill bit. Does not replicate a production
> scenario.
> > > > > > > > >
> > > > > > > > > Would prefer the hive way with a central Jenkins
server
> > hosted
> > > on
> > > > > aws
> > > > > > > and
> > > > > > > > > accessible to everyone.  Users should be able
to submit a
> git
> > > url
> > > > > and
> > > > > > > > that
> > > > > > > > > should be able to deploy and fire off tests.
Should then
> > have a
> > > > way
> > > > > > to
> > > > > > > > > easily communicate failures to contributors and
if success
> > > notify
> > > > > the
> > > > > > > > > commiters to commit the change.
> > > > > > > > >
> > > > > > > > > Ps: if hive's way is open source maybe we can
look into
> reuse
> > > > > rather
> > > > > > > than
> > > > > > > > > doing it from scratch. Esp the Jenkins and configuration
> > stuff.
> > > > > > > > >
> > > > > > > > > Regards
> > > > > > > > > Ramana
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thursday, July 23, 2015, Parth Chandra <
> parthc@apache.org
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Drill devs use a set of tests that are not
available as
> > part
> > > of
> > > > > the
> > > > > > > > > Apache
> > > > > > > > > > distribution. These tests are a pre-requisite
for all
> > > commits,
> > > > > but
> > > > > > > are
> > > > > > > > > not
> > > > > > > > > > available to any contributors outside the
current devs.
> > > > > > > > > >
> > > > > > > > > > This thread is to discuss various options
to make these
> > tests
> > > > > > > > available.
> > > > > > > > > >
> > > > > > > > > > Assumptions and requirements  -
> > > > > > > > > > 1) A functional test (as opposed to a unit
test) needs to
> > be
> > > > > closer
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > end user environment than a development
environment. As
> > such,
> > > > we
> > > > > > > should
> > > > > > > > > be
> > > > > > > > > > running functional tests in a cluster environment,
> connect
> > > > using
> > > > > > > > > zookeeper
> > > > > > > > > > etc.
> > > > > > > > > > 2) Functional test will keep increasing
in number, get
> more
> > > > > complex
> > > > > > > and
> > > > > > > > > > take a longer and longer time to execute
as we go along.
> > > > > > > > > > 3) Some requirements are:
> > > > > > > > > >     a) We want to be strict in enforcing
the pre-commit
> > > > > > requirements,
> > > > > > > > but
> > > > > > > > > > not penalize the contributor who has a minor
fix.
> > > > > > > > > >     b) All parts of the product (especially
various
> > > 'certified'
> > > > > > > storage
> > > > > > > > > > plugins like Hive and Hbase should get tested)
> > > > > > > > > >     c) It should be easy to debug issues
when a test
> fails.
> > > > Tests
> > > > > > > > should
> > > > > > > > > > fail deterministically. If a test fails,
it should always
> > > fail
> > > > > and
> > > > > > > > always
> > > > > > > > > > fail in the same way (easier said than done).
> > > > > > > > > >
> > > > > > > > > > Some suggestions -
> > > > > > > > > > 1) Tests should be a top-level maven module
within the
> > drill
> > > > > > project
> > > > > > > > > >         a) We want  the integration tests
to run as part
> of
> > > the
> > > > > > > drill's
> > > > > > > > > > maven build process
> > > > > > > > > >         b) The build step for the integration-tests
> module
> > > > would
> > > > > > > launch
> > > > > > > > > an
> > > > > > > > > > embedded drillbit and runs tests against
it
> > > > > > > > > >         c) The tests will be a separate
target so they
> need
> > > not
> > > > > be
> > > > > > > run
> > > > > > > > > all
> > > > > > > > > > the time
> > > > > > > > > >  2) Tests should be divided into multiple
suites that are
> > > based
> > > > > on
> > > > > > > > > > components. For example a test suite for
testing
> datatypes
> > > will
> > > > > > > contain
> > > > > > > > > the
> > > > > > > > > > tests for various datatypes including complex
types. A
> > > > > contributor
> > > > > > or
> > > > > > > > > > developer can then run these tests more
frequently as an
> > > issue
> > > > is
> > > > > > > being
> > > > > > > > > > addressed and run the entire suite only
once before
> commit.
> > > > > > > > > > 3) Provide the tests as a hosted service
> > > > > > > > > > 4) Setup a bot to fire the test on an AWS
cluster and
> post
> > > the
> > > > > > > results
> > > > > > > > to
> > > > > > > > > > the JIRA  (Hive does this). Or some variant
of this idea.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Some questions -
> > > > > > > > > > 1) What do some other projects do?
> > > > > > > > > > 2) Are there any technologies we can leverage
that will
> > make
> > > > this
> > > > > > > > easier?
> > > > > > > > > > 3) How do we make it easier to debug failing
tests.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Please feel free to question the assumptions
and
> > > requirements.
> > > > Be
> > > > > > > > > creative
> > > > > > > > > > with your suggestions.
> > > > > > > > > >
> > > > > > > > > > Parth
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message