drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Girish <abhishek.gir...@gmail.com>
Subject Re: [DISCUSS] Publishing advanced/functional tests
Date Tue, 04 Aug 2015 18:39:32 GMT
We not only re-distribute external data-sets as-is, but also include
variants for those (text -> parquet, json, ...). So the challenge here is
not simply disabling automatic downloads via the framework, and point users
to manually download the files before running the framework, but also about
how we will handle tests which require variants of the data sets. It simply
isn't practical to users of the framework to (1) download data-gen manually
(2) use specific seed / options before generating data, (3) convert them to
parquet, etc.. (4) move them to specific locations inside their copy of the
framework.

Something we'll need to know is how other projects are handling bench-mark
& other external datasets.

-Abhishek

On Tue, Aug 4, 2015 at 11:23 AM, rahul challapalli <
challapallirahul@gmail.com> wrote:

> Thanks for your inputs.
>
> Once issue with just publishing the tests in their current state is that,
> the framework re-distributes tpch, tpcds, yelp data sets without requiring
> the users to accept their relevant licenses. A good number of tests uses
> these data sets. Any thoughts on how to handle this?
>
> - Rahul
>
> On Wed, Jul 29, 2015 at 12:07 AM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
>
> > +1.  Get it out there.
> >
> >
> >
> > On Tue, Jul 28, 2015 at 10:12 PM, Jacques Nadeau <jacques@dremio.com>
> > wrote:
> >
> > > Hey Rahul,
> > >
> > > My suggestion would be to the lower bar--do the absolute bare minimum
> to
> > > get the tests out there.  For example, simply remove proprietary
> > > information and then get it on a public github (whether your personal
> > > github or a corporate one).  From there, people can help by submitting
> > pull
> > > requests to improve the infrastructure and harness.  Making things
> easier
> > > is something that can be done over time.  For example, we've had offers
> > > from a couple different Linux Admins to help on something.  I'm sure
> that
> > > they could help with a number of the items you've identified.  In the
> > mean
> > > time, we risk patches being merged that have less than complete
> testing.
> > >
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Mon, Jul 27, 2015 at 2:16 PM, rahul challapalli <
> > > challapallirahul@gmail.com> wrote:
> > >
> > > > Jacques,
> > > >
> > > > I am breaking down steps 1,2 & 3 into sub-tasks so we can
> > add/prioritize
> > > > these tasks
> > > >
> > > > Item #TaskSub-TaskCommentsPriority1*Publish the tests*
> > > >
> > > >
> > > >
> > > >
> > > > Remove Proprietary Data & Queries
> > > > 0
> > > >
> > > > Redact Propriety Data/Queries
> > > >
> > > >
> > > >
> > > > Move tests into drill repo
> > > > This requires some refactoring to the framework code since the test
> > > > framework uses a 2-level directory structure
> > > >
> > > >
> > > >
> > > > Organize the tests using a label based approach
> > > > This involves code changes and moving a lot of files. When doing a
> one
> > > time
> > > > push it might be better to do this before publishing the tests?
> > > >
> > > >
> > > > Each suite should be independentSome suites wrongly assume that the
> > data
> > > is
> > > > present. They should be identified and fixed
> > > >
> > > >
> > > > Cleanup hardcoded dependencies during data generationSome data-gen
> > > scripts
> > > > have hard-coded references
> > > >
> > > >
> > > > Cleanup downloadsThe same dataset is being downloaded multiple times
> by
> > > > different suites
> > > >
> > > >
> > > > Licenses for downloadsThe framework downloads some files
> automatically.
> > > > These files are publicly available.
> > > > However before downloading them users need to agree to certain terms.
> > By
> > > > using the framework users might be skipping this step. We should look
> > > into
> > > > this
> > > > 2*Setup a cluster infrastructure to run the pre-commit tests*
> > > >
> > > >
> > > > 3*Local debugging of tests*
> > > >
> > > >
> > > >
> > > >
> > > > Add an optional maven target for running tests on a local machine
> > > > Tests can launch an embedded drillbit or they can connect to a
> running
> > > > drillbit through zookeeper
> > > >
> > > >
> > > > Running suites which require additional setup (hive, hbase etc)
> should
> > be
> > > > made optional
> > > >
> > > > 4*Documentation*
> > > >
> > > >
> > > >
> > > >
> > > > Running Tests (options available and also listing the asumed
> defaults)
> > > >
> > > >
> > > >
> > > > Explaining how tests are organized
> > > >
> > > >
> > > >
> > > > Process for adding a new suite
> > > >
> > > >
> > > >
> > > > On Fri, Jul 24, 2015 at 1:40 PM, Jacques Nadeau <jacques@dremio.com>
> > > > wrote:
> > > >
> > > > > Let's get number one done (tests out there so all community members
> > can
> > > > run
> > > > > them).  Then the whole community can work together to solve the
> rest.
> > > > >
> > > > > I don't think the base install should include integration test
> > > execution.
> > > > > I do think the tests should be in the main repo (as opposed to a
> > > > > secondary).
> > > > >
> > > > > We should strive to ultimately make running these integration
> tests a
> > > > > requirement for merging.  We need to complete all the steps before
> we
> > > can
> > > > > impose that.  I should be able to help on the global run component
> > and
> > > > > supporting infrastructure.
> > > > >
> > > > > J
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Jacques Nadeau
> > > > > CTO and Co-Founder, Dremio
> > > > >
> > > > > On Fri, Jul 24, 2015 at 1:29 PM, rahul challapalli <
> > > > > challapallirahul@gmail.com> wrote:
> > > > >
> > > > > > Ramana,
> > > > > >
> > > > > > You are right. We are trying to address multiple issues here,
but
> > not
> > > > > with
> > > > > > a single solution. I am summarizing them
> > > > > >
> > > > > > 1. Tests should be visible to everyone (Implicit goal)
> > > > > > 2. Before applying a patch we should run tests in a clustered
> > > > > environment.
> > > > > > Parth had a suggestion(#4) in his original email.
> > > > > > 3. Developers should be able to debug majority of the tests
on
> > their
> > > > > local
> > > > > > environment. I made a few suggestions above to this regard
> > > > > >
> > > > > > - Rahul
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 24, 2015 at 10:40 AM, Ramana I N <inramana@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > One important thing which we need to be clear on here is
what
> are
> > > we
> > > > > > trying
> > > > > > > to address?
> > > > > > >
> > > > > > > I feel there are two separate issues here and I do not
think
> one
> > > > > solution
> > > > > > > will fit both the issues.
> > > > > > >
> > > > > > >    1. Allowing developers to run tests on their local box
so
> they
> > > > know
> > > > > > the
> > > > > > >    changes they have are not completely wrong.
> > > > > > >    2. Allowing transparency in the integration tests process
> > which
> > > is
> > > > > > >    currently a black box.
> > > > > > >
> > > > > > > 1 is needed for developers to make changes and have an
idea
> that
> > > > their
> > > > > > > changes are not going to fail tests en masse in the integration
> > > > suite.
> > > > > 2
> > > > > > is
> > > > > > > needed because its a prerequisite for changes to be committed.
> > > > > > >
> > > > > > >
> > > > > > > Regards
> > > > > > > Ramana
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jul 24, 2015 at 10:28 AM, rahul challapalli <
> > > > > > > challapallirahul@gmail.com> wrote:
> > > > > > >
> > > > > > > > Ramana,
> > > > > > > >
> > > > > > > > Let me fill in more details.
> > > > > > > >
> > > > > > > > 1. Before we accept a patch we want to make sure the
tests
> run
> > > in a
> > > > > > > cluster
> > > > > > > > environment. No exceptions here.
> > > > > > > > 2. We want  the contributors to be able to debug the
failing
> > > tests
> > > > on
> > > > > > > their
> > > > > > > > laptops in as many cases as possbile. This requires
:
> > > > > > > >         1. Tests should run on top of a local file
system.
> > (Tests
> > > > can
> > > > > > > > launch an embedded drillbit or they can connect to
a running
> > > > drillbit
> > > > > > > > through zookeeper)
> > > > > > > >         2. Running suites which require additional
setup
> (hive,
> > > > hbase
> > > > > > > etc)
> > > > > > > > should be made optional and sufficient documentation
should
> be
> > > > > provided
> > > > > > > for
> > > > > > > > enabling and disabling these tests.
> > > > > > > > 3. In my opinion making these new tests part of drill
would
> > make
> > > it
> > > > > > > easier
> > > > > > > > for the developers to debug and run tests instead
of having a
> > > > > different
> > > > > > > > repository. But as you said it might bloat the drill
project
> > > > > > > >
> > > > > > > > - Rahul
> > > > > > > >
> > > > > > > > On Fri, Jul 24, 2015 at 9:42 AM, Ted Dunning <
> > > > ted.dunning@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > The Hadoop family of projects has some software
that
> > > integrates a
> > > > > > > > > continuous integration system so that every time
a JIRA is
> > > marked
> > > > > as
> > > > > > > > > patch-available, the associated patch attached
to the bug
> > will
> > > > have
> > > > > > > > > integration tests run against it.  I believe
that there has
> > > been
> > > > > some
> > > > > > > > > process to use git hashes instead of patches.
 The CI
> results
> > > are
> > > > > put
> > > > > > > > back
> > > > > > > > > on the JIRA.
> > > > > > > > >
> > > > > > > > > This is done using a fairly simple set of scripts.
 Apache
> > > Yetus
> > > > is
> > > > > > > just
> > > > > > > > > forming as a direct-to-top-level spinoff from
Hadoop
> > > > > > > > >
> > > > > > > > > Proposal is here (don't be fooled by the fact
that it looks
> > > like
> > > > an
> > > > > > > > > incubation proposal):
> > > > > > > > >
> > > > > > > > > http://wiki.apache.org/incubator/YetusProposal
> > > > > > > > >
> > > > > > > > > Early code can be found here (don't guess that
this is very
> > > real
> > > > > > yet).
> > > > > > > > > More links can be found in the proposal.
> > > > > > > > >
> > > > > > > > >
> > https://github.com/sekikn/pre-yetus/tree/master/precommit/docs
> > > > > > > > >
> > > > > > > > > The project has not yet been formed and there
are no
> mailing
> > > > lists
> > > > > or
> > > > > > > git
> > > > > > > > > repo yet.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Jul 24, 2015 at 9:25 AM, Ramana I N <
> > > inramana@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > As someone who worked on this for a while,
including it
> as
> > > part
> > > > > of
> > > > > > > > drill
> > > > > > > > > > may bloat drill a bit too much. Also not
a big fan of
> > running
> > > > > > against
> > > > > > > > an
> > > > > > > > > > embedded drillbit. Does not replicate an
actual
> production
> > > use
> > > > > > case.
> > > > > > > > > >
> > > > > > > > > > Additionally, setting up hive hbase and
other components
> > > maybe
> > > > > > > painful
> > > > > > > > > and
> > > > > > > > > > unnecessary for most ppl. It would deter
people from ever
> > > > > > > contributing
> > > > > > > > to
> > > > > > > > > > drill. We could spin up in memory hive and
hbase but
> that's
> > > > > similar
> > > > > > > to
> > > > > > > > an
> > > > > > > > > > embedded drill bit. Does not replicate a
production
> > scenario.
> > > > > > > > > >
> > > > > > > > > > Would prefer the hive way with a central
Jenkins server
> > > hosted
> > > > on
> > > > > > aws
> > > > > > > > and
> > > > > > > > > > accessible to everyone.  Users should be
able to submit a
> > git
> > > > url
> > > > > > and
> > > > > > > > > that
> > > > > > > > > > should be able to deploy and fire off tests.
Should then
> > > have a
> > > > > way
> > > > > > > to
> > > > > > > > > > easily communicate failures to contributors
and if
> success
> > > > notify
> > > > > > the
> > > > > > > > > > commiters to commit the change.
> > > > > > > > > >
> > > > > > > > > > Ps: if hive's way is open source maybe we
can look into
> > reuse
> > > > > > rather
> > > > > > > > than
> > > > > > > > > > doing it from scratch. Esp the Jenkins and
configuration
> > > stuff.
> > > > > > > > > >
> > > > > > > > > > Regards
> > > > > > > > > > Ramana
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Thursday, July 23, 2015, Parth Chandra
<
> > parthc@apache.org
> > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Drill devs use a set of tests that
are not available as
> > > part
> > > > of
> > > > > > the
> > > > > > > > > > Apache
> > > > > > > > > > > distribution. These tests are a pre-requisite
for all
> > > > commits,
> > > > > > but
> > > > > > > > are
> > > > > > > > > > not
> > > > > > > > > > > available to any contributors outside
the current devs.
> > > > > > > > > > >
> > > > > > > > > > > This thread is to discuss various options
to make these
> > > tests
> > > > > > > > > available.
> > > > > > > > > > >
> > > > > > > > > > > Assumptions and requirements  -
> > > > > > > > > > > 1) A functional test (as opposed to
a unit test) needs
> to
> > > be
> > > > > > closer
> > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > > end user environment than a development
environment. As
> > > such,
> > > > > we
> > > > > > > > should
> > > > > > > > > > be
> > > > > > > > > > > running functional tests in a cluster
environment,
> > connect
> > > > > using
> > > > > > > > > > zookeeper
> > > > > > > > > > > etc.
> > > > > > > > > > > 2) Functional test will keep increasing
in number, get
> > more
> > > > > > complex
> > > > > > > > and
> > > > > > > > > > > take a longer and longer time to execute
as we go
> along.
> > > > > > > > > > > 3) Some requirements are:
> > > > > > > > > > >     a) We want to be strict in enforcing
the pre-commit
> > > > > > > requirements,
> > > > > > > > > but
> > > > > > > > > > > not penalize the contributor who has
a minor fix.
> > > > > > > > > > >     b) All parts of the product (especially
various
> > > > 'certified'
> > > > > > > > storage
> > > > > > > > > > > plugins like Hive and Hbase should
get tested)
> > > > > > > > > > >     c) It should be easy to debug issues
when a test
> > fails.
> > > > > Tests
> > > > > > > > > should
> > > > > > > > > > > fail deterministically. If a test fails,
it should
> always
> > > > fail
> > > > > > and
> > > > > > > > > always
> > > > > > > > > > > fail in the same way (easier said than
done).
> > > > > > > > > > >
> > > > > > > > > > > Some suggestions -
> > > > > > > > > > > 1) Tests should be a top-level maven
module within the
> > > drill
> > > > > > > project
> > > > > > > > > > >         a) We want  the integration
tests to run as
> part
> > of
> > > > the
> > > > > > > > drill's
> > > > > > > > > > > maven build process
> > > > > > > > > > >         b) The build step for the integration-tests
> > module
> > > > > would
> > > > > > > > launch
> > > > > > > > > > an
> > > > > > > > > > > embedded drillbit and runs tests against
it
> > > > > > > > > > >         c) The tests will be a separate
target so they
> > need
> > > > not
> > > > > > be
> > > > > > > > run
> > > > > > > > > > all
> > > > > > > > > > > the time
> > > > > > > > > > >  2) Tests should be divided into multiple
suites that
> are
> > > > based
> > > > > > on
> > > > > > > > > > > components. For example a test suite
for testing
> > datatypes
> > > > will
> > > > > > > > contain
> > > > > > > > > > the
> > > > > > > > > > > tests for various datatypes including
complex types. A
> > > > > > contributor
> > > > > > > or
> > > > > > > > > > > developer can then run these tests
more frequently as
> an
> > > > issue
> > > > > is
> > > > > > > > being
> > > > > > > > > > > addressed and run the entire suite
only once before
> > commit.
> > > > > > > > > > > 3) Provide the tests as a hosted service
> > > > > > > > > > > 4) Setup a bot to fire the test on
an AWS cluster and
> > post
> > > > the
> > > > > > > > results
> > > > > > > > > to
> > > > > > > > > > > the JIRA  (Hive does this). Or some
variant of this
> idea.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Some questions -
> > > > > > > > > > > 1) What do some other projects do?
> > > > > > > > > > > 2) Are there any technologies we can
leverage that will
> > > make
> > > > > this
> > > > > > > > > easier?
> > > > > > > > > > > 3) How do we make it easier to debug
failing tests.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Please feel free to question the assumptions
and
> > > > requirements.
> > > > > Be
> > > > > > > > > > creative
> > > > > > > > > > > with your suggestions.
> > > > > > > > > > >
> > > > > > > > > > > Parth
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message