Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E6AAF200BEF for ; Wed, 21 Dec 2016 02:56:35 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id E5423160B33; Wed, 21 Dec 2016 01:56:35 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 161C4160B29 for ; Wed, 21 Dec 2016 02:56:34 +0100 (CET) Received: (qmail 79258 invoked by uid 500); 21 Dec 2016 01:56:34 -0000 Mailing-List: contact dev-help@asterixdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@asterixdb.apache.org Delivered-To: mailing list dev@asterixdb.apache.org Received: (qmail 79235 invoked by uid 99); 21 Dec 2016 01:56:34 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Dec 2016 01:56:34 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 934461A9F16 for ; Wed, 21 Dec 2016 01:56:33 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.929 X-Spam-Level: ** X-Spam-Status: No, score=2.929 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id sJeDC3YlfqYx for ; Wed, 21 Dec 2016 01:56:31 +0000 (UTC) Received: from mail-it0-f50.google.com (mail-it0-f50.google.com [209.85.214.50]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 1F3795FC87 for ; Wed, 21 Dec 2016 01:56:31 +0000 (UTC) Received: by mail-it0-f50.google.com with SMTP id 75so70368160ite.1 for ; Tue, 20 Dec 2016 17:56:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=4zXKLwVgS58VMglQe9XTQO+F6rmDHkj/uLifyBFglPE=; b=BltwUDRRAr4lNV4TVT5RI7IuBhrjUDwLplaRLfQRnlinyaUDG8GGCnJ79SI5yxjazJ 9S5VT3nWjeAz3uV4Z0e7T06vxTOVP1uFaLEhOfuFaHXkEzsLRz6heVYVrPiSa1hdXfhv hXintyC4D7NhYc8HSEtRGAZ2j9jmwyw0SOCtkAVU1+w+6E2ySOxxgvQpEohQx39pip9U 4LOZLfZ+wEAa/0hmzedOBSYd3obuFUuXd74PPmx+UHiMeVS6ChaVVUpUkwjFvGnuVSQI dYFH2GwClLmbCmAEAJKfIozF8UVJfPsXQjSSeLFMrhfWu6RPLxC1HAH/DtiLRcWJlhaM +2cA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=4zXKLwVgS58VMglQe9XTQO+F6rmDHkj/uLifyBFglPE=; b=sDasUeSZd9I8DMNdj2MA6M8mDXcP0/F2rlLFGHveF0QtxHESpzSN197jzfW6fhWE2Z wVFyZPngUsKP70HjuiIB4MPnoGD62l/v98/tul1fm2FFmi3cLxmgz7v26MIPmRqcpXtc xOAAX+NyoPD4RmcXtSwvA1OreIM5NxjUuX3vhNqQNnYi8qvNFv3Pd935s1iYyQLqZrEe J+WP3aquHDVbGGr9+IFe5WlfkNiByQRW9upsMYbb1XtbSL1gQIKIKOTKtNHWj89+AKLz vv317+uYwOWKvAJTQYt8l5OXxOnqfMmkL9Q0fCZIap32FUIVLwakXiqLKbRLEVhP3o8C 6tBA== X-Gm-Message-State: AIkVDXJse2sAz1Y2xLZuLOVSvKhZX9ppnAg6eJ6fnQ0F0aOv6cXWIUvKsLIIWDLeaEbEDS+DZEt6Dhpn3wktdg== X-Received: by 10.36.159.3 with SMTP id c3mr5598218ite.50.1482285366163; Tue, 20 Dec 2016 17:56:06 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.19.87 with HTTP; Tue, 20 Dec 2016 17:55:25 -0800 (PST) In-Reply-To: References: <009201d25b1c$4f50c130$edf24390$@gmail.com> <00b601d25b22$e2dfa910$a89efb30$@gmail.com> From: mingda li Date: Tue, 20 Dec 2016 17:55:25 -0800 Message-ID: Subject: Re: Time of Multiple Joins in AsterixDB To: dev@asterixdb.apache.org Cc: Michael Carey , Tyson Condie Content-Type: multipart/alternative; boundary=94eb2c089eccb4bca1054421743e archived-at: Wed, 21 Dec 2016 01:56:36 -0000 --94eb2c089eccb4bca1054421743e Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable BTW: For the 1-10G figure's label, 'W' means wrong order while 'R' means right order. We always put the wrong order join left side to right order join. And for the 1-100G figure's, we didn't add labels but the wrong order join is also in left side. On Tue, Dec 20, 2016 at 5:44 PM, mingda li wrote: > Oh, sure. When we test the 100G multiple join, we find AsterixDB is slowe= r > than Spark (but still faster than Pig and Hive). > I can share with you the both plots: 1-10G.eps and 1-100G.eps. (We will > only use 1-10G.eps in our paper). > And thanks for Ian's advice:* The dev list generally strips attachments. > Maybe you can just put the config inline? Or link to a pastebin/gist?* > I know why you can't see the attachments. So I move the plots with two > documents to my Dropbox. > You can find the > 1-10G.eps here: https://www.dropbox.com/s/rk3xg6gigsfcuyq/1-10G.eps?dl=3D= 0 > 1-100G.eps here:https://www.dropbox.com/s/tyxnmt6ehau2ski/1-100G.eps?dl= =3D0 > cc_conf.pdf here: https://www.dropbox.com/s/y3of1s17qdstv5f/cc_conf.pdf? > dl=3D0 > CompleteQuery.pdf here: https://www.dropbox.com/s/lml3fzxfjcmf2c1/ > CompleteQuery.pdf?dl=3D0 > > On Tue, Dec 20, 2016 at 4:40 PM, Tyson Condie > wrote: > >> Mingda: Please also share the numbers for 100GB, which show AsterixDB no= t >> quite doing as well as Spark. These 100GB results will not be in our >> submission version, since they=E2=80=99re not needed for the desired mes= sage: >> picking the right join order matters. Nevertheless, I=E2=80=99d like to = get a >> better understanding of what=E2=80=99s going on in the larger dataset re= gime. >> >> >> >> -Tyson >> >> >> >> From: Yingyi Bu [mailto:buyingyi@gmail.com] >> Sent: Tuesday, December 20, 2016 4:30 PM >> To: dev@asterixdb.apache.org >> Cc: Michael Carey ; Tyson Condie < >> tcondie.ucla@gmail.com> >> Subject: Re: Time of Multiple Joins in AsterixDB >> >> >> >> Hi Mingda, >> >> >> >> It looks that you didn't attach the pdf? >> >> Thanks! >> >> >> >> Best, >> >> Yingyi >> >> >> >> On Tue, Dec 20, 2016 at 4:15 PM, mingda li > > wrote: >> >> Sorry for the wrong version of cc.conf. I convert it to pdf version as >> attachment. >> >> >> >> On Tue, Dec 20, 2016 at 4:06 PM, mingda li > > wrote: >> >> Dear all, >> >> >> >> I am testing different systems' (AsterixDB, Spark, Hive, Pig) multiple >> joins to see if there is a big difference with different join order. Thi= s >> is the reason for our research on multiple join and the result will appp= ear >> in our paper which is to be submitted to VLDB soon. Could you help us to >> make sure that the test results make sense for AsterixDB? >> >> >> >> We configure the AsterixDB 0.8.9 ( use asterix-server-0.8.9-SNAPSHOT-bin= ary-assembly) >> in our cluster of 16 machines, each with a 3.40GHz i7 processor (4 cores >> and 2 hyper-threads per core), 32GB of RAM and 1TB of disk capacity. The >> operating system is 64-bit Ubuntu 12.04. JDK version 1.8.0. During >> configuration, I follow the NCService instruction here >> https://ci.apache.org/projects/asterixdb/ncservice.html. And I set the >> cc.conf as in attachment. (Each node work as nc and the first node also >> work as cc). >> >> >> >> For experiment, we use 3 fact tables from TPC-DS: inventory; >> catalog_sales; catalog_returns with TPC-DS scale factor 1g and 10g. The >> multiple join query we use in AsterixDB are as following: >> >> >> >> Good Join Order: SELECT COUNT(*) FROM (SELECT * FROM catalog_sales cs1 >> JOIN catalog_returns cr1 >> >> ON (cs1.cs_order_number =3D cr1.cr_order_number AND cs1.cs_item_sk =3D >> cr1.cr_item_sk)) m1 JOIN inventory i1 ON i1.inv_item_sk =3D cs1.cs_item= _sk; >> >> >> >> Bad Join Order: SELECT COUNT(*) FROM (SELECT * FROM catalog_sales cs1 >> JOIN inventory i1 ON cs1.cs_item_sk =3D i1.inv_item_sk) m1 JOIN >> catalog_returns cr1 ON (cs1.cs_order_number =3D cr1.cr_order_number AND >> cs1.cs_item_sk =3D cr1.cr_item_sk); >> >> >> >> We load the data to AsterixDB firstly and run the two different queries. >> (The complete version of all queries for AsterixDB is in attachment) We >> assume the data has already been stored in AsterixDB and only count the >> time for multiple join. >> >> >> >> Meanwhile, we use the same dataset and query to test Spark, Pig and Hive= . >> The result is shown in the attachment's figure. And you can find >> AsterixDB's time is always better than others no matter good or bad >> order:-) (BTW, the y scale of figure is time in log scale. You can see t= he >> time by the label of each bar.) >> >> >> >> Thanks for your help. >> >> >> >> Bests, >> >> Mingda >> >> >> >> >> >> >> >> >> >> > --94eb2c089eccb4bca1054421743e--