Return-Path: X-Original-To: apmail-spark-dev-archive@minotaur.apache.org Delivered-To: apmail-spark-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9DF3D10474 for ; Sat, 12 Oct 2013 21:23:04 +0000 (UTC) Received: (qmail 35198 invoked by uid 500); 12 Oct 2013 21:23:04 -0000 Delivered-To: apmail-spark-dev-archive@spark.apache.org Received: (qmail 35165 invoked by uid 500); 12 Oct 2013 21:23:04 -0000 Mailing-List: contact dev-help@spark.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@spark.incubator.apache.org Delivered-To: mailing list dev@spark.incubator.apache.org Received: (qmail 35157 invoked by uid 99); 12 Oct 2013 21:23:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Oct 2013 21:23:04 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ctn@adatao.com designates 209.85.223.182 as permitted sender) Received: from [209.85.223.182] (HELO mail-ie0-f182.google.com) (209.85.223.182) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Oct 2013 21:22:58 +0000 Received: by mail-ie0-f182.google.com with SMTP id as1so8720241iec.13 for ; Sat, 12 Oct 2013 14:22:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=adatao.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=iSZX2YxBlKXS6bfMtu1JkAlMwz+Wbdm8IVnrFSXx0tA=; b=W9kzkd5VfGbocCgkPrxFeBW916YbTcD2j8dIz59Qwjv1Y4VFwa7BnLNOMyEBgpUA9L AlMm8N50TAUM9nN4hzr/EsA7eTAzDO9aAUHZC3z0F0rZfF7dm5TguxPfm6MQhHEh6VWJ rrslDP+sZZ/XmyFnMKA0Sqj4nbi1V3ZuJOhmg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=iSZX2YxBlKXS6bfMtu1JkAlMwz+Wbdm8IVnrFSXx0tA=; b=KB5xaspQ6T85bKWRbdFZ1ejIComt4P4CJcCF8FUYT+U7FcqVTg4MFsJ6gLH9Ps1Kmy ERiQm/MWadzfo1iNyrnUYuiggSoal04LBb0K/H3JhDXFWoAdoW+u7e65CncPuGzcKhJI 2yKWFj0Jvlr3FWveZP1sDt5wQdDOmDb1LaSqP4QuAUkMorl702qOiDJMZUoUbvDGlyvS WJtwYv5oWI8hFpATQcfQFTKaehEtvHEAFCMMx/I+Za0LXzwkbs/yAw/134Crf0Ba98X4 F2jpN2RAMeboIYZmEmKSsCfCbQdyABPZyy5Vqmmk/dcw1WiSATfVXW3tq2R+F4FhDfv6 D68g== X-Gm-Message-State: ALoCoQl90nQ4fjOkD3B+X27c6X5wfLg6az331P+dw64wGAvQdcdWE+ckRI3xGZ1+wiy8Vf6miol/ X-Received: by 10.50.78.162 with SMTP id c2mr7906535igx.20.1381612957091; Sat, 12 Oct 2013 14:22:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.64.152.163 with HTTP; Sat, 12 Oct 2013 14:22:16 -0700 (PDT) X-Originating-IP: [75.18.168.112] In-Reply-To: References: From: Christopher Nguyen Date: Sat, 12 Oct 2013 14:22:16 -0700 Message-ID: Subject: Re: Test coverage of Spark To: dev@spark.incubator.apache.org Content-Type: multipart/alternative; boundary=089e0111e074867cd404e891d59d X-Virus-Checked: Checked by ClamAV on apache.org --089e0111e074867cd404e891d59d Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Roman, an area I think would (a) have high impact, and (b) is relatively not well covered is performance analysis. I'm sure most teams are doing this internally at their respective companies, but there is no shared code base and shared wisdom about what we're finding/improving. For example, consider the task of loading a table from disk into memory by Shark. We're getting conflicting data about how much of this is cpu-bound vs I/O-bound. Our effort to track this down should be sharable somehow, and would benefit from others' findings. Of course this is dependent on the particular configuration, but there is a lot of test harness code/scripts that can be shared. And individual findings, even if/especially if they are conflicting, are very valuable if well documented. There is a Benchmark effort covered here https://amplab.cs.berkeley.edu/benchmark/, but it addresses a slightly different goal. You could consider this Perf-Analysis as part of that, or as its own effort. This may be more than you were looking to own, but given your stated enthusiasm :) I want to throw the idea out there. -- Christopher T. Nguyen Co-founder & CEO, Adatao linkedin.com/in/ctnguyen On Sat, Oct 12, 2013 at 1:48 PM, =D0=A0=D0=BE=D0=BC=D0=B0=D0=BD =D0=A2=D0= =BA=D0=B0=D0=BB=D0=B5=D0=BD=D0=BA=D0=BE wrote: > Hello. > I'm trying to dive into Spark's sources on a deeper-than-mere-glance leve= l > and I find beginning with writing unit tests a good way to do it. So, > basically, I'm wondering if there are points to which I could specificall= y > apply my enthusiasm, i. e. are there some un- or not enough covered parts > for which I could write some tests? > I'm wondering as well about the state of Apache-hosted JIRA for Spark - I > currently can't see any entry in there. Should I look for them in Github > mirror or still in the antecedent JIRA instance on > http://spark-project.atlassian.net/? > Regards, > Roman. > --089e0111e074867cd404e891d59d--