Return-Path: X-Original-To: apmail-drill-dev-archive@www.apache.org Delivered-To: apmail-drill-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 65F90183F4 for ; Tue, 4 Aug 2015 18:25:38 +0000 (UTC) Received: (qmail 39715 invoked by uid 500); 4 Aug 2015 18:25:38 -0000 Delivered-To: apmail-drill-dev-archive@drill.apache.org Received: (qmail 39668 invoked by uid 500); 4 Aug 2015 18:25:38 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 39655 invoked by uid 99); 4 Aug 2015 18:25:37 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Aug 2015 18:25:37 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 5B134DA561 for ; Tue, 4 Aug 2015 18:25:37 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id hnmd0F2NHos3 for ; Tue, 4 Aug 2015 18:25:28 +0000 (UTC) Received: from mail-la0-f53.google.com (mail-la0-f53.google.com [209.85.215.53]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 863624381D for ; Tue, 4 Aug 2015 18:25:27 +0000 (UTC) Received: by labgo9 with SMTP id go9so13117804lab.3 for ; Tue, 04 Aug 2015 11:23:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=1b73MzzJ3Lf/A7XR0aLV2/tw+1rQPxxhL1/tEqprCkQ=; b=F9xm1ymwD8u6UAiaJRE2cn72PcS2x1y/XT8V19lrIGFbREGTmy1vD1lRyUFL8da06q MKrpEPSMBhGPEJa6wUdAQmG+1s9vm+QHhRRSTbpICHwkWEdqHRIOhX6WerBIpb+ypAaV rIYufsqHF/Qlb6Ncj1tab/4MoHVIfRSyM99qFoM0PNUgrMXIQYlGTCsq75xFx+ofsLQG jQgMsIDmkGnAyn2Lu1kEcJcPnNfsFA4wF7vC4ic1TEs7PjXEeB17fmeESQe3Y98KGzj/ 1SZdqG/QQsZpEjufFYfR4CncYGbyHqMKXVUyNhEsy2JjAbmUB7xhxFgwheP7+AeQ5DW+ V4jA== X-Received: by 10.112.77.103 with SMTP id r7mr5084984lbw.63.1438712636095; Tue, 04 Aug 2015 11:23:56 -0700 (PDT) MIME-Version: 1.0 Received: by 10.114.96.201 with HTTP; Tue, 4 Aug 2015 11:23:36 -0700 (PDT) In-Reply-To: References: From: rahul challapalli Date: Tue, 4 Aug 2015 11:23:36 -0700 Message-ID: Subject: Re: [DISCUSS] Publishing advanced/functional tests To: dev@drill.apache.org Content-Type: multipart/alternative; boundary=001a11c3f0269bc35c051c80637a --001a11c3f0269bc35c051c80637a Content-Type: text/plain; charset=UTF-8 Thanks for your inputs. Once issue with just publishing the tests in their current state is that, the framework re-distributes tpch, tpcds, yelp data sets without requiring the users to accept their relevant licenses. A good number of tests uses these data sets. Any thoughts on how to handle this? - Rahul On Wed, Jul 29, 2015 at 12:07 AM, Ted Dunning wrote: > +1. Get it out there. > > > > On Tue, Jul 28, 2015 at 10:12 PM, Jacques Nadeau > wrote: > > > Hey Rahul, > > > > My suggestion would be to the lower bar--do the absolute bare minimum to > > get the tests out there. For example, simply remove proprietary > > information and then get it on a public github (whether your personal > > github or a corporate one). From there, people can help by submitting > pull > > requests to improve the infrastructure and harness. Making things easier > > is something that can be done over time. For example, we've had offers > > from a couple different Linux Admins to help on something. I'm sure that > > they could help with a number of the items you've identified. In the > mean > > time, we risk patches being merged that have less than complete testing. > > > > > > -- > > Jacques Nadeau > > CTO and Co-Founder, Dremio > > > > On Mon, Jul 27, 2015 at 2:16 PM, rahul challapalli < > > challapallirahul@gmail.com> wrote: > > > > > Jacques, > > > > > > I am breaking down steps 1,2 & 3 into sub-tasks so we can > add/prioritize > > > these tasks > > > > > > Item #TaskSub-TaskCommentsPriority1*Publish the tests* > > > > > > > > > > > > > > > Remove Proprietary Data & Queries > > > 0 > > > > > > Redact Propriety Data/Queries > > > > > > > > > > > > Move tests into drill repo > > > This requires some refactoring to the framework code since the test > > > framework uses a 2-level directory structure > > > > > > > > > > > > Organize the tests using a label based approach > > > This involves code changes and moving a lot of files. When doing a one > > time > > > push it might be better to do this before publishing the tests? > > > > > > > > > Each suite should be independentSome suites wrongly assume that the > data > > is > > > present. They should be identified and fixed > > > > > > > > > Cleanup hardcoded dependencies during data generationSome data-gen > > scripts > > > have hard-coded references > > > > > > > > > Cleanup downloadsThe same dataset is being downloaded multiple times by > > > different suites > > > > > > > > > Licenses for downloadsThe framework downloads some files automatically. > > > These files are publicly available. > > > However before downloading them users need to agree to certain terms. > By > > > using the framework users might be skipping this step. We should look > > into > > > this > > > 2*Setup a cluster infrastructure to run the pre-commit tests* > > > > > > > > > 3*Local debugging of tests* > > > > > > > > > > > > > > > Add an optional maven target for running tests on a local machine > > > Tests can launch an embedded drillbit or they can connect to a running > > > drillbit through zookeeper > > > > > > > > > Running suites which require additional setup (hive, hbase etc) should > be > > > made optional > > > > > > 4*Documentation* > > > > > > > > > > > > > > > Running Tests (options available and also listing the asumed defaults) > > > > > > > > > > > > Explaining how tests are organized > > > > > > > > > > > > Process for adding a new suite > > > > > > > > > > > > On Fri, Jul 24, 2015 at 1:40 PM, Jacques Nadeau > > > wrote: > > > > > > > Let's get number one done (tests out there so all community members > can > > > run > > > > them). Then the whole community can work together to solve the rest. > > > > > > > > I don't think the base install should include integration test > > execution. > > > > I do think the tests should be in the main repo (as opposed to a > > > > secondary). > > > > > > > > We should strive to ultimately make running these integration tests a > > > > requirement for merging. We need to complete all the steps before we > > can > > > > impose that. I should be able to help on the global run component > and > > > > supporting infrastructure. > > > > > > > > J > > > > > > > > > > > > > > > > -- > > > > Jacques Nadeau > > > > CTO and Co-Founder, Dremio > > > > > > > > On Fri, Jul 24, 2015 at 1:29 PM, rahul challapalli < > > > > challapallirahul@gmail.com> wrote: > > > > > > > > > Ramana, > > > > > > > > > > You are right. We are trying to address multiple issues here, but > not > > > > with > > > > > a single solution. I am summarizing them > > > > > > > > > > 1. Tests should be visible to everyone (Implicit goal) > > > > > 2. Before applying a patch we should run tests in a clustered > > > > environment. > > > > > Parth had a suggestion(#4) in his original email. > > > > > 3. Developers should be able to debug majority of the tests on > their > > > > local > > > > > environment. I made a few suggestions above to this regard > > > > > > > > > > - Rahul > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 24, 2015 at 10:40 AM, Ramana I N > > > wrote: > > > > > > > > > > > One important thing which we need to be clear on here is what are > > we > > > > > trying > > > > > > to address? > > > > > > > > > > > > I feel there are two separate issues here and I do not think one > > > > solution > > > > > > will fit both the issues. > > > > > > > > > > > > 1. Allowing developers to run tests on their local box so they > > > know > > > > > the > > > > > > changes they have are not completely wrong. > > > > > > 2. Allowing transparency in the integration tests process > which > > is > > > > > > currently a black box. > > > > > > > > > > > > 1 is needed for developers to make changes and have an idea that > > > their > > > > > > changes are not going to fail tests en masse in the integration > > > suite. > > > > 2 > > > > > is > > > > > > needed because its a prerequisite for changes to be committed. > > > > > > > > > > > > > > > > > > Regards > > > > > > Ramana > > > > > > > > > > > > > > > > > > On Fri, Jul 24, 2015 at 10:28 AM, rahul challapalli < > > > > > > challapallirahul@gmail.com> wrote: > > > > > > > > > > > > > Ramana, > > > > > > > > > > > > > > Let me fill in more details. > > > > > > > > > > > > > > 1. Before we accept a patch we want to make sure the tests run > > in a > > > > > > cluster > > > > > > > environment. No exceptions here. > > > > > > > 2. We want the contributors to be able to debug the failing > > tests > > > on > > > > > > their > > > > > > > laptops in as many cases as possbile. This requires : > > > > > > > 1. Tests should run on top of a local file system. > (Tests > > > can > > > > > > > launch an embedded drillbit or they can connect to a running > > > drillbit > > > > > > > through zookeeper) > > > > > > > 2. Running suites which require additional setup (hive, > > > hbase > > > > > > etc) > > > > > > > should be made optional and sufficient documentation should be > > > > provided > > > > > > for > > > > > > > enabling and disabling these tests. > > > > > > > 3. In my opinion making these new tests part of drill would > make > > it > > > > > > easier > > > > > > > for the developers to debug and run tests instead of having a > > > > different > > > > > > > repository. But as you said it might bloat the drill project > > > > > > > > > > > > > > - Rahul > > > > > > > > > > > > > > On Fri, Jul 24, 2015 at 9:42 AM, Ted Dunning < > > > ted.dunning@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > The Hadoop family of projects has some software that > > integrates a > > > > > > > > continuous integration system so that every time a JIRA is > > marked > > > > as > > > > > > > > patch-available, the associated patch attached to the bug > will > > > have > > > > > > > > integration tests run against it. I believe that there has > > been > > > > some > > > > > > > > process to use git hashes instead of patches. The CI results > > are > > > > put > > > > > > > back > > > > > > > > on the JIRA. > > > > > > > > > > > > > > > > This is done using a fairly simple set of scripts. Apache > > Yetus > > > is > > > > > > just > > > > > > > > forming as a direct-to-top-level spinoff from Hadoop > > > > > > > > > > > > > > > > Proposal is here (don't be fooled by the fact that it looks > > like > > > an > > > > > > > > incubation proposal): > > > > > > > > > > > > > > > > http://wiki.apache.org/incubator/YetusProposal > > > > > > > > > > > > > > > > Early code can be found here (don't guess that this is very > > real > > > > > yet). > > > > > > > > More links can be found in the proposal. > > > > > > > > > > > > > > > > > https://github.com/sekikn/pre-yetus/tree/master/precommit/docs > > > > > > > > > > > > > > > > The project has not yet been formed and there are no mailing > > > lists > > > > or > > > > > > git > > > > > > > > repo yet. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 24, 2015 at 9:25 AM, Ramana I N < > > inramana@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > > > As someone who worked on this for a while, including it as > > part > > > > of > > > > > > > drill > > > > > > > > > may bloat drill a bit too much. Also not a big fan of > running > > > > > against > > > > > > > an > > > > > > > > > embedded drillbit. Does not replicate an actual production > > use > > > > > case. > > > > > > > > > > > > > > > > > > Additionally, setting up hive hbase and other components > > maybe > > > > > > painful > > > > > > > > and > > > > > > > > > unnecessary for most ppl. It would deter people from ever > > > > > > contributing > > > > > > > to > > > > > > > > > drill. We could spin up in memory hive and hbase but that's > > > > similar > > > > > > to > > > > > > > an > > > > > > > > > embedded drill bit. Does not replicate a production > scenario. > > > > > > > > > > > > > > > > > > Would prefer the hive way with a central Jenkins server > > hosted > > > on > > > > > aws > > > > > > > and > > > > > > > > > accessible to everyone. Users should be able to submit a > git > > > url > > > > > and > > > > > > > > that > > > > > > > > > should be able to deploy and fire off tests. Should then > > have a > > > > way > > > > > > to > > > > > > > > > easily communicate failures to contributors and if success > > > notify > > > > > the > > > > > > > > > commiters to commit the change. > > > > > > > > > > > > > > > > > > Ps: if hive's way is open source maybe we can look into > reuse > > > > > rather > > > > > > > than > > > > > > > > > doing it from scratch. Esp the Jenkins and configuration > > stuff. > > > > > > > > > > > > > > > > > > Regards > > > > > > > > > Ramana > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thursday, July 23, 2015, Parth Chandra < > parthc@apache.org > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Drill devs use a set of tests that are not available as > > part > > > of > > > > > the > > > > > > > > > Apache > > > > > > > > > > distribution. These tests are a pre-requisite for all > > > commits, > > > > > but > > > > > > > are > > > > > > > > > not > > > > > > > > > > available to any contributors outside the current devs. > > > > > > > > > > > > > > > > > > > > This thread is to discuss various options to make these > > tests > > > > > > > > available. > > > > > > > > > > > > > > > > > > > > Assumptions and requirements - > > > > > > > > > > 1) A functional test (as opposed to a unit test) needs to > > be > > > > > closer > > > > > > > to > > > > > > > > > the > > > > > > > > > > end user environment than a development environment. As > > such, > > > > we > > > > > > > should > > > > > > > > > be > > > > > > > > > > running functional tests in a cluster environment, > connect > > > > using > > > > > > > > > zookeeper > > > > > > > > > > etc. > > > > > > > > > > 2) Functional test will keep increasing in number, get > more > > > > > complex > > > > > > > and > > > > > > > > > > take a longer and longer time to execute as we go along. > > > > > > > > > > 3) Some requirements are: > > > > > > > > > > a) We want to be strict in enforcing the pre-commit > > > > > > requirements, > > > > > > > > but > > > > > > > > > > not penalize the contributor who has a minor fix. > > > > > > > > > > b) All parts of the product (especially various > > > 'certified' > > > > > > > storage > > > > > > > > > > plugins like Hive and Hbase should get tested) > > > > > > > > > > c) It should be easy to debug issues when a test > fails. > > > > Tests > > > > > > > > should > > > > > > > > > > fail deterministically. If a test fails, it should always > > > fail > > > > > and > > > > > > > > always > > > > > > > > > > fail in the same way (easier said than done). > > > > > > > > > > > > > > > > > > > > Some suggestions - > > > > > > > > > > 1) Tests should be a top-level maven module within the > > drill > > > > > > project > > > > > > > > > > a) We want the integration tests to run as part > of > > > the > > > > > > > drill's > > > > > > > > > > maven build process > > > > > > > > > > b) The build step for the integration-tests > module > > > > would > > > > > > > launch > > > > > > > > > an > > > > > > > > > > embedded drillbit and runs tests against it > > > > > > > > > > c) The tests will be a separate target so they > need > > > not > > > > > be > > > > > > > run > > > > > > > > > all > > > > > > > > > > the time > > > > > > > > > > 2) Tests should be divided into multiple suites that are > > > based > > > > > on > > > > > > > > > > components. For example a test suite for testing > datatypes > > > will > > > > > > > contain > > > > > > > > > the > > > > > > > > > > tests for various datatypes including complex types. A > > > > > contributor > > > > > > or > > > > > > > > > > developer can then run these tests more frequently as an > > > issue > > > > is > > > > > > > being > > > > > > > > > > addressed and run the entire suite only once before > commit. > > > > > > > > > > 3) Provide the tests as a hosted service > > > > > > > > > > 4) Setup a bot to fire the test on an AWS cluster and > post > > > the > > > > > > > results > > > > > > > > to > > > > > > > > > > the JIRA (Hive does this). Or some variant of this idea. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Some questions - > > > > > > > > > > 1) What do some other projects do? > > > > > > > > > > 2) Are there any technologies we can leverage that will > > make > > > > this > > > > > > > > easier? > > > > > > > > > > 3) How do we make it easier to debug failing tests. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please feel free to question the assumptions and > > > requirements. > > > > Be > > > > > > > > > creative > > > > > > > > > > with your suggestions. > > > > > > > > > > > > > > > > > > > > Parth > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --001a11c3f0269bc35c051c80637a--