Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id DC5DB200BF7 for ; Mon, 26 Dec 2016 01:33:19 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id DA718160B2C; Mon, 26 Dec 2016 00:33:19 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5DD14160B2B for ; Mon, 26 Dec 2016 01:33:18 +0100 (CET) Received: (qmail 68845 invoked by uid 500); 26 Dec 2016 00:33:17 -0000 Mailing-List: contact dev-help@asterixdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@asterixdb.apache.org Delivered-To: mailing list dev@asterixdb.apache.org Received: (qmail 68829 invoked by uid 99); 26 Dec 2016 00:33:17 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Dec 2016 00:33:17 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id A3BA11A047D for ; Mon, 26 Dec 2016 00:33:16 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.649 X-Spam-Level: *** X-Spam-Status: No, score=3.649 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id iTK3NMIaGTzu for ; Mon, 26 Dec 2016 00:33:09 +0000 (UTC) Received: from mail-io0-f179.google.com (mail-io0-f179.google.com [209.85.223.179]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 026BE5F1BA for ; Mon, 26 Dec 2016 00:33:08 +0000 (UTC) Received: by mail-io0-f179.google.com with SMTP id d9so288394443ioe.0 for ; Sun, 25 Dec 2016 16:33:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=gH6kdgK4Ij0dw0KAFrUPIsVkUp7mYf2eitVO6kI/9A0=; b=QPXAe4gqKIh80SVXwLt79QHwkw8yaBsYUYqzzy3K4N2iaO6VbQrW3puntlHScm4w6Z wCPAVI3JUzCAh0le/U6ACoLPSjWj+ccAYsZGlI5RRmvA00wzXyn/Yzv+DIwnIVJiP+/V Plu9UJmQt3/vD9ssif0yNH7xUdB5hvdg5bYzxsFl7iqx9aqe+6CwxIIiZhrwD0qY8COK wSKvgn5h3Zj0djN8mbUEFC7tncIM0nB7/YgKlREFwzeZuNUdRvRmzSHRLXZ4qdgByhE/ hQ+HBFO1u+tNYtk1+vxQtBrlGrcplJ03rLVFcKasSG+AUau66rtg9rEUxE/rUrTziD35 48Rg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=gH6kdgK4Ij0dw0KAFrUPIsVkUp7mYf2eitVO6kI/9A0=; b=QnULR3PHUXIs7ftQwhWcSb+4hiwOlRf33/AWUIEur1qhMZmXGeo3NO6QLraxa6+bBo J/5j1mgchH5NkTFeT/W3CJ6OlGuxY1Ar8sjcBrTvk+DKdym+UxhwWZza7ElouQOkTOeN IBybYYUYrTYz7aZrUXKUG741sDmHq+8zi4c80mR1wYOaBNICPFw3/0CfXFnymeT6OKhR 9hjx6gnzi0C59PAMwsH5Cb9NNmsZCiGcivgYEuIHqAtvmhpR/5TcDGYsKTscH/0hPb4s 4kPATFU0g1+n7XyOx9NR/prLRnqiWs5NYU3yiHyu3HeVwcaZ923YYC1j9fot2/43OBJN ulHw== X-Gm-Message-State: AIkVDXKpcsKDjb+/dttoBb4+11OTEobCsSAScJC9cxJs9qEgqnBBPxGe8yCUkcDXFhNHchsxch+R8leaEPjPWQ== X-Received: by 10.107.161.78 with SMTP id k75mr24457514ioe.39.1482712386567; Sun, 25 Dec 2016 16:33:06 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.19.87 with HTTP; Sun, 25 Dec 2016 16:32:25 -0800 (PST) In-Reply-To: References: <009201d25b1c$4f50c130$edf24390$@gmail.com> <00b601d25b22$e2dfa910$a89efb30$@gmail.com> From: mingda li Date: Sun, 25 Dec 2016 16:32:25 -0800 Message-ID: Subject: Re: Time of Multiple Joins in AsterixDB To: dev@asterixdb.apache.org Cc: Michael Carey , Tyson Condie Content-Type: multipart/alternative; boundary=001a1140f75c1b038f054484e18e archived-at: Mon, 26 Dec 2016 00:33:20 -0000 --001a1140f75c1b038f054484e18e Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Thanks for Yingyi's advice. I use the new configuration cc.conf: https://www.dropbox.com/s/xoycm5i0wmr7uet/cc.conf?dl=3D0 and save about hal= f time for all different scale's multiple joins with different orders on AsterixDB. I update the picture we will use in our VLDB paper https://www.dropbox.com/s/op7x938yxwpwgit/1-10G.eps?dl=3D0 ( 'W' means wrong order while 'R' means right order. We always put the wrong order join left side to right order join.) You can find AsterixDB is always far more better than other systems. And the plot including 100G is updated here https://www.dropbox.com/s/epbnqvqx5fdgtr4/1-100G.eps?dl=3D0 (Wrong order al= so in left) Bests and Happy holiday :-) Mingda On Wed, Dec 21, 2016 at 8:42 PM, Yingyi Bu wrote: > Cool, thanks, Mingda! > Look forward to the new numbers! > > Best, > Yingyi > > On Wed, Dec 21, 2016 at 7:13 PM, mingda li wrote= : > > > Dear Yingyi, > > > > Thanks for your suggestion. I have reset the AsterixDB and retest > AsterixDB > > using new environment on 100G, 10G. The efficiency for all the tests > (good > > and bad order) have been all improved to twice speed. > > I will finish all the tests and update the result later. > > > > Bests, > > Mingda > > > > On Tue, Dec 20, 2016 at 10:20 PM, Yingyi Bu wrote: > > > > > Hi Mingda, > > > > > > I think that in your setting, a better configuration for Asterix= DB > > > might be to use 64 partitions, i.e., 4 cores *16. > > > 1. To achieve that, you have to have 4 iodevices on each NC, e.g= .: > > > iodevices=3D/home/clash/asterixStorage/asterixdb5/red16 > > > --> > > > iodevices=3D/home/clash/asterixStorage/asterixdb5/red16-1, > > > iodevices=3D/home/clash/asterixStorage/asterixdb5/red16-2, > > > iodevices=3D/home/clash/asterixStorage/asterixdb5/red16-3, > > > iodevices=3D/home/clash/asterixStorage/asterixdb5/red16-4 > > > > > > 2. Assume you have 64 partitions, 31.01G/64 ~=3D 0.5G. That mea= ns > > you'd > > > better have 512MB memory budget for each joiner so as to make the joi= n > > > memory-resident. > > > To achieve that, in the cc section in the cc.conf, you coul= d > > add: > > > compiler.joinmemory=3D536870912 > > > > > > 3. For the JVM setting, 1024MB is too small for the NC. > > > In the shared NC section in cc.conf, you can add: > > > jvm.args=3D-Xmx16G > > > > > > 4. For Pig and Hive, you can set the maximum mapper/reducer > numbers > > in > > > the MapReduce configuration, e.g., at most 4 mappers per machine and = at > > > most 4 reducers per machine. > > > > > > 5. I'm not super-familiar with hyper threading, but it might be > > worth > > > trying 8 partitions per machine, i.e., 128 partitions in total. > > > > > > To validate if the new settings work, you can go to the > > admin/cluster > > > page to double check. > > > Pls keep us updated and let us know if you run into any issue. > > > Thanks! > > > > > > Best, > > > Yingyi > > > > > > > > > On Tue, Dec 20, 2016 at 9:26 PM, mingda li > > wrote: > > > > > > > Dear Yingyi, > > > > 1. For the returned of :http://:19002/admin/cluster > > > > > > > > { > > > > "cc": { > > > > "configUri": "http://scai01.cs.ucla.edu: > > > > 19002/admin/cluster/cc/config", > > > > "statsUri": "http://scai01.cs.ucla.edu: > > > > 19002/admin/cluster/cc/stats", > > > > "threadDumpUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/cc/threaddump" > > > > }, > > > > "config": { > > > > "api.port": 19002, > > > > "cc.java.opts": "-Xmx1024m", > > > > "cluster.partitions": { > > > > "0": "ID:0, Original Node: red16, IODevice: 0, Active > Node: > > > > red16", > > > > "1": "ID:1, Original Node: red15, IODevice: 0, Active > Node: > > > > red15", > > > > "2": "ID:2, Original Node: red14, IODevice: 0, Active > Node: > > > > red14", > > > > "3": "ID:3, Original Node: red13, IODevice: 0, Active > Node: > > > > red13", > > > > "4": "ID:4, Original Node: red12, IODevice: 0, Active > Node: > > > > red12", > > > > "5": "ID:5, Original Node: red11, IODevice: 0, Active > Node: > > > > red11", > > > > "6": "ID:6, Original Node: red10, IODevice: 0, Active > Node: > > > > red10", > > > > "7": "ID:7, Original Node: red9, IODevice: 0, Active > Node: > > > > red9", > > > > "8": "ID:8, Original Node: red8, IODevice: 0, Active > Node: > > > > red8", > > > > "9": "ID:9, Original Node: red7, IODevice: 0, Active > Node: > > > > red7", > > > > "10": "ID:10, Original Node: red6, IODevice: 0, Active > > Node: > > > > red6", > > > > "11": "ID:11, Original Node: red5, IODevice: 0, Active > > Node: > > > > red5", > > > > "12": "ID:12, Original Node: red4, IODevice: 0, Active > > Node: > > > > red4", > > > > "13": "ID:13, Original Node: red3, IODevice: 0, Active > > Node: > > > > red3", > > > > "14": "ID:14, Original Node: red2, IODevice: 0, Active > > Node: > > > > red2", > > > > "15": "ID:15, Original Node: red, IODevice: 0, Active > Node: > > > > red" > > > > }, > > > > "compiler.framesize": 32768, > > > > "compiler.groupmemory": 33554432, > > > > "compiler.joinmemory": 33554432, > > > > "compiler.pregelix.home": "~/pregelix", > > > > "compiler.sortmemory": 33554432, > > > > "core.dump.paths": { > > > > "red": "/home/clash/asterixStorage/ > > asterixdb5/red/coredump", > > > > "red10": "/home/clash/asterixStorage/ > > > > asterixdb5/red10/coredump", > > > > "red11": "/home/clash/asterixStorage/ > > > > asterixdb5/red11/coredump", > > > > "red12": "/home/clash/asterixStorage/ > > > > asterixdb5/red12/coredump", > > > > "red13": "/home/clash/asterixStorage/ > > > > asterixdb5/red13/coredump", > > > > "red14": "/home/clash/asterixStorage/ > > > > asterixdb5/red14/coredump", > > > > "red15": "/home/clash/asterixStorage/ > > > > asterixdb5/red15/coredump", > > > > "red16": "/home/clash/asterixStorage/ > > > > asterixdb5/red16/coredump", > > > > "red2": "/home/clash/asterixStorage/ > > > asterixdb5/red2/coredump", > > > > "red3": "/home/clash/asterixStorage/ > > > asterixdb5/red3/coredump", > > > > "red4": "/home/clash/asterixStorage/ > > > asterixdb5/red4/coredump", > > > > "red5": "/home/clash/asterixStorage/ > > > asterixdb5/red5/coredump", > > > > "red6": "/home/clash/asterixStorage/ > > > asterixdb5/red6/coredump", > > > > "red7": "/home/clash/asterixStorage/ > > > asterixdb5/red7/coredump", > > > > "red8": "/home/clash/asterixStorage/ > > > asterixdb5/red8/coredump", > > > > "red9": "/home/clash/asterixStorage/ > > > asterixdb5/red9/coredump" > > > > }, > > > > "feed.central.manager.port": 4500, > > > > "feed.max.threshold.period": 5, > > > > "feed.memory.available.wait.timeout": 10, > > > > "feed.memory.global.budget": 67108864, > > > > "feed.pending.work.threshold": 50, > > > > "feed.port": 19003, > > > > "instance.name": "DEFAULT_INSTANCE", > > > > "log.level": "WARNING", > > > > "max.wait.active.cluster": 60, > > > > "metadata.callback.port": 0, > > > > "metadata.node": "red16", > > > > "metadata.partition": "ID:0, Original Node: red16, IODevice= : > > > > 0, Active Node: red16", > > > > "metadata.port": 0, > > > > "metadata.registration.timeout.secs": 60, > > > > "nc.java.opts": "-Xmx1024m", > > > > "node.partitions": { > > > > "red": ["ID:15, Original Node: red, IODevice: 0, Active > > Node: > > > > red"], > > > > "red10": ["ID:6, Original Node: red10, IODevice: 0, > Active > > > > Node: red10"], > > > > "red11": ["ID:5, Original Node: red11, IODevice: 0, > Active > > > > Node: red11"], > > > > "red12": ["ID:4, Original Node: red12, IODevice: 0, > Active > > > > Node: red12"], > > > > "red13": ["ID:3, Original Node: red13, IODevice: 0, > Active > > > > Node: red13"], > > > > "red14": ["ID:2, Original Node: red14, IODevice: 0, > Active > > > > Node: red14"], > > > > "red15": ["ID:1, Original Node: red15, IODevice: 0, > Active > > > > Node: red15"], > > > > "red16": ["ID:0, Original Node: red16, IODevice: 0, > Active > > > > Node: red16"], > > > > "red2": ["ID:14, Original Node: red2, IODevice: 0, Acti= ve > > > > Node: red2"], > > > > "red3": ["ID:13, Original Node: red3, IODevice: 0, Acti= ve > > > > Node: red3"], > > > > "red4": ["ID:12, Original Node: red4, IODevice: 0, Acti= ve > > > > Node: red4"], > > > > "red5": ["ID:11, Original Node: red5, IODevice: 0, Acti= ve > > > > Node: red5"], > > > > "red6": ["ID:10, Original Node: red6, IODevice: 0, Acti= ve > > > > Node: red6"], > > > > "red7": ["ID:9, Original Node: red7, IODevice: 0, Activ= e > > > > Node: red7"], > > > > "red8": ["ID:8, Original Node: red8, IODevice: 0, Activ= e > > > > Node: red8"], > > > > "red9": ["ID:7, Original Node: red9, IODevice: 0, Activ= e > > > > Node: red9"] > > > > }, > > > > "node.stores": { > > > > "red": ["/home/clash/asterixStorage/ > > > asterixdb5/red/storage"], > > > > "red10": ["/home/clash/asterixStorage/ > > > > asterixdb5/red10/storage"], > > > > "red11": ["/home/clash/asterixStorage/ > > > > asterixdb5/red11/storage"], > > > > "red12": ["/home/clash/asterixStorage/ > > > > asterixdb5/red12/storage"], > > > > "red13": ["/home/clash/asterixStorage/ > > > > asterixdb5/red13/storage"], > > > > "red14": ["/home/clash/asterixStorage/ > > > > asterixdb5/red14/storage"], > > > > "red15": ["/home/clash/asterixStorage/ > > > > asterixdb5/red15/storage"], > > > > "red16": ["/home/clash/asterixStorage/ > > > > asterixdb5/red16/storage"], > > > > "red2": ["/home/clash/asterixStorage/ > > > > asterixdb5/red2/storage"], > > > > "red3": ["/home/clash/asterixStorage/ > > > > asterixdb5/red3/storage"], > > > > "red4": ["/home/clash/asterixStorage/ > > > > asterixdb5/red4/storage"], > > > > "red5": ["/home/clash/asterixStorage/ > > > > asterixdb5/red5/storage"], > > > > "red6": ["/home/clash/asterixStorage/ > > > > asterixdb5/red6/storage"], > > > > "red7": ["/home/clash/asterixStorage/ > > > > asterixdb5/red7/storage"], > > > > "red8": ["/home/clash/asterixStorage/ > > > > asterixdb5/red8/storage"], > > > > "red9": ["/home/clash/asterixStorage/ > > > asterixdb5/red9/storage"] > > > > }, > > > > "plot.activate": false, > > > > "replication.enabled": false, > > > > "replication.factor": 2, > > > > "replication.log.batchsize": 4096, > > > > "replication.log.buffer.numpages": 8, > > > > "replication.log.buffer.pagesize": 131072, > > > > "replication.max.remote.recovery.attempts": 5, > > > > "replication.timeout": 30, > > > > "storage.buffercache.maxopenfiles": 2147483647, > > > > "storage.buffercache.pagesize": 131072, > > > > "storage.buffercache.size": 536870912, > > > > "storage.lsm.bloomfilter.falsepositiverate": 0.01, > > > > "storage.memorycomponent.globalbudget": 536870912, > > > > "storage.memorycomponent.numcomponents": 2, > > > > "storage.memorycomponent.numpages": 256, > > > > "storage.memorycomponent.pagesize": 131072, > > > > "storage.metadata.memorycomponent.numpages": 256, > > > > "transaction.log.dirs": { > > > > "red": "/home/clash/asterixStorage/ > asterixdb5/red/txnlog", > > > > "red10": "/home/clash/asterixStorage/ > > > asterixdb5/red10/txnlog", > > > > "red11": "/home/clash/asterixStorage/ > > > asterixdb5/red11/txnlog", > > > > "red12": "/home/clash/asterixStorage/ > > > asterixdb5/red12/txnlog", > > > > "red13": "/home/clash/asterixStorage/ > > > asterixdb5/red13/txnlog", > > > > "red14": "/home/clash/asterixStorage/ > > > asterixdb5/red14/txnlog", > > > > "red15": "/home/clash/asterixStorage/ > > > asterixdb5/red15/txnlog", > > > > "red16": "/home/clash/asterixStorage/ > > > asterixdb5/red16/txnlog", > > > > "red2": "/home/clash/asterixStorage/ > > asterixdb5/red2/txnlog", > > > > "red3": "/home/clash/asterixStorage/ > > asterixdb5/red3/txnlog", > > > > "red4": "/home/clash/asterixStorage/ > > asterixdb5/red4/txnlog", > > > > "red5": "/home/clash/asterixStorage/ > > asterixdb5/red5/txnlog", > > > > "red6": "/home/clash/asterixStorage/ > > asterixdb5/red6/txnlog", > > > > "red7": "/home/clash/asterixStorage/ > > asterixdb5/red7/txnlog", > > > > "red8": "/home/clash/asterixStorage/ > > asterixdb5/red8/txnlog", > > > > "red9": "/home/clash/asterixStorage/ > > asterixdb5/red9/txnlog" > > > > }, > > > > "txn.commitprofiler.reportinterval": 5, > > > > "txn.job.recovery.memorysize": 67108864, > > > > "txn.lock.escalationthreshold": 1000, > > > > "txn.lock.shrinktimer": 5000, > > > > "txn.lock.timeout.sweepthreshold": 10000, > > > > "txn.lock.timeout.waitthreshold": 60000, > > > > "txn.log.buffer.numpages": 8, > > > > "txn.log.buffer.pagesize": 131072, > > > > "txn.log.checkpoint.history": 0, > > > > "txn.log.checkpoint.lsnthreshold": 67108864, > > > > "txn.log.checkpoint.pollfrequency": 120, > > > > "txn.log.partitionsize": 268435456, > > > > "web.port": 19001, > > > > "web.queryinterface.port": 19006, > > > > "web.secondary.port": 19005 > > > > }, > > > > "fullShutdownUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/shutdown?all=3Dtrue", > > > > "metadata_node": "red16", > > > > "ncs": [ > > > > { > > > > "configUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/config", > > > > "node_id": "red15", > > > > "partitions": [{ > > > > "active": true, > > > > "partition_id": "partition_1" > > > > }], > > > > "state": "ACTIVE", > > > > "statsUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/stats", > > > > "threadDumpUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/threaddum= p > " > > > > }, > > > > { > > > > "configUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/config", > > > > "node_id": "red14", > > > > "partitions": [{ > > > > "active": true, > > > > "partition_id": "partition_2" > > > > }], > > > > "state": "ACTIVE", > > > > "statsUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/stats", > > > > "threadDumpUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/threaddum= p > " > > > > }, > > > > { > > > > "configUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/config", > > > > "node_id": "red16", > > > > "partitions": [{ > > > > "active": true, > > > > "partition_id": "partition_0" > > > > }], > > > > "state": "ACTIVE", > > > > "statsUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/stats", > > > > "threadDumpUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/threaddum= p > " > > > > }, > > > > { > > > > "configUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/config", > > > > "node_id": "red11", > > > > "partitions": [{ > > > > "active": true, > > > > "partition_id": "partition_5" > > > > }], > > > > "state": "ACTIVE", > > > > "statsUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/stats", > > > > "threadDumpUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/threaddum= p > " > > > > }, > > > > { > > > > "configUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/config", > > > > "node_id": "red10", > > > > "partitions": [{ > > > > "active": true, > > > > "partition_id": "partition_6" > > > > }], > > > > "state": "ACTIVE", > > > > "statsUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/stats", > > > > "threadDumpUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/threaddum= p > " > > > > }, > > > > { > > > > "configUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/config", > > > > "node_id": "red13", > > > > "partitions": [{ > > > > "active": true, > > > > "partition_id": "partition_3" > > > > }], > > > > "state": "ACTIVE", > > > > "statsUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/stats", > > > > "threadDumpUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/threaddum= p > " > > > > }, > > > > { > > > > "configUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/config", > > > > "node_id": "red12", > > > > "partitions": [{ > > > > "active": true, > > > > "partition_id": "partition_4" > > > > }], > > > > "state": "ACTIVE", > > > > "statsUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/stats", > > > > "threadDumpUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/threaddum= p > " > > > > }, > > > > { > > > > "configUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/config", > > > > "node_id": "red6", > > > > "partitions": [{ > > > > "active": true, > > > > "partition_id": "partition_10" > > > > }], > > > > "state": "ACTIVE", > > > > "statsUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/stats", > > > > "threadDumpUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/threaddump= " > > > > }, > > > > { > > > > "configUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/config", > > > > "node_id": "red", > > > > "partitions": [{ > > > > "active": true, > > > > "partition_id": "partition_15" > > > > }], > > > > "state": "ACTIVE", > > > > "statsUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/stats", > > > > "threadDumpUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/threaddump" > > > > }, > > > > { > > > > "configUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/config", > > > > "node_id": "red5", > > > > "partitions": [{ > > > > "active": true, > > > > "partition_id": "partition_11" > > > > }], > > > > "state": "ACTIVE", > > > > "statsUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/stats", > > > > "threadDumpUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/threaddump= " > > > > }, > > > > { > > > > "configUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/config", > > > > "node_id": "red8", > > > > "partitions": [{ > > > > "active": true, > > > > "partition_id": "partition_8" > > > > }], > > > > "state": "ACTIVE", > > > > "statsUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/stats", > > > > "threadDumpUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/threaddump= " > > > > }, > > > > { > > > > "configUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/config", > > > > "node_id": "red7", > > > > "partitions": [{ > > > > "active": true, > > > > "partition_id": "partition_9" > > > > }], > > > > "state": "ACTIVE", > > > > "statsUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/stats", > > > > "threadDumpUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/threaddump= " > > > > }, > > > > { > > > > "configUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/config", > > > > "node_id": "red2", > > > > "partitions": [{ > > > > "active": true, > > > > "partition_id": "partition_14" > > > > }], > > > > "state": "ACTIVE", > > > > "statsUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/stats", > > > > "threadDumpUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/threaddump= " > > > > }, > > > > { > > > > "configUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/config", > > > > "node_id": "red4", > > > > "partitions": [{ > > > > "active": true, > > > > "partition_id": "partition_12" > > > > }], > > > > "state": "ACTIVE", > > > > "statsUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/stats", > > > > "threadDumpUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/threaddump= " > > > > }, > > > > { > > > > "configUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/config", > > > > "node_id": "red3", > > > > "partitions": [{ > > > > "active": true, > > > > "partition_id": "partition_13" > > > > }], > > > > "state": "ACTIVE", > > > > "statsUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/stats", > > > > "threadDumpUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/threaddump= " > > > > }, > > > > { > > > > "configUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/config", > > > > "node_id": "red9", > > > > "partitions": [{ > > > > "active": true, > > > > "partition_id": "partition_7" > > > > }], > > > > "state": "ACTIVE", > > > > "statsUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/stats", > > > > "threadDumpUri": > > > > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/threaddump= " > > > > } > > > > ], > > > > "shutdownUri": "http://scai01.cs.ucla.edu:19002/admin/shutdown"= , > > > > "state": "ACTIVE", > > > > "versionUri": "http://scai01.cs.ucla.edu:19002/admin/version" > > > > } > > > > > > > > 2.Catalog_return:2.28G > > > > > > > > catalog_sales:31.01G > > > > > > > > inventory:8.63G > > > > > > > > 3.As for Pig and Hive, I always use the default configuration. I > didn't > > > set > > > > the partition things for them. And for Spark, we use 200 partitions= , > > > which > > > > may be improved and just not bad. For AsterixDB, I also set the > cluster > > > > using default value of partition and JVM things (I didn't manually > set > > > > these parameters). > > > > > > > > > > > > > > > > On Tue, Dec 20, 2016 at 5:58 PM, Yingyi Bu > wrote: > > > > > > > > > Mingda, > > > > > > > > > > 1. Can you paste the returned JSON of http:// > > > > node>:19002/admin/cluster at your side? (Pls replace > > with > > > > the > > > > > actual master node name or IP) > > > > > 2. Can you list the individual size of each dataset involved > in > > > the > > > > > query, e.g., catalog_returns, catalog_sales, and inventory? (I > > assume > > > > > 100GB is the overall size?) > > > > > 3. Do Spark/Hive/Pig saturate all CPUs on all machines, i.e.= , > > how > > > > many > > > > > partitions are running on each machine? (It seems that your > > AsterixDB > > > > > configuration wouldn't saturate all CPUs for queries --- in the > > current > > > > > AsterixDB master, the computation parallelism is set to be the sa= me > > as > > > > the > > > > > storage parallelism (i.e., the number of iodevices on each NC). > I've > > > > > submitted a new patch that allow flexible computation parallelism= , > > > which > > > > > should be able to get merged into master very soon.) > > > > > Thanks! > > > > > > > > > > Best, > > > > > Yingyi > > > > > > > > > > On Tue, Dec 20, 2016 at 5:44 PM, mingda li > > > > > wrote: > > > > > > > > > > > Oh, sure. When we test the 100G multiple join, we find AsterixD= B > is > > > > > slower > > > > > > than Spark (but still faster than Pig and Hive). > > > > > > I can share with you the both plots: 1-10G.eps and 1-100G.eps. > (We > > > will > > > > > > only use 1-10G.eps in our paper). > > > > > > And thanks for Ian's advice:* The dev list generally strips > > > > attachments. > > > > > > Maybe you can just put the config inline? Or link to a > > > pastebin/gist?* > > > > > > I know why you can't see the attachments. So I move the plots > with > > > two > > > > > > documents to my Dropbox. > > > > > > You can find the > > > > > > 1-10G.eps here: https://www.dropbox.com/s/ > > > > rk3xg6gigsfcuyq/1-10G.eps?dl=3D0 > > > > > > 1-100G.eps here:https://www.dropbox.com/ > > s/tyxnmt6ehau2ski/1-100G.eps > > > ? > > > > > dl=3D0 > > > > > > cc_conf.pdf here: https://www.dropbox.com/s/ > > > > y3of1s17qdstv5f/cc_conf.pdf? > > > > > > dl=3D0 > > > > > > CompleteQuery.pdf here: > > > > > > https://www.dropbox.com/s/lml3fzxfjcmf2c1/CompleteQuery.pdf?dl= =3D0 > > > > > > > > > > > > On Tue, Dec 20, 2016 at 4:40 PM, Tyson Condie < > > > tcondie.ucla@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Mingda: Please also share the numbers for 100GB, which show > > > AsterixDB > > > > > not > > > > > > > quite doing as well as Spark. These 100GB results will not be > in > > > our > > > > > > > submission version, since they=E2=80=99re not needed for the = desired > > > message: > > > > > > > picking the right join order matters. Nevertheless, I=E2=80= =99d like to > > > get a > > > > > > > better understanding of what=E2=80=99s going on in the larger= dataset > > > regime. > > > > > > > > > > > > > > > > > > > > > > > > > > > > -Tyson > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: Yingyi Bu [mailto:buyingyi@gmail.com] > > > > > > > Sent: Tuesday, December 20, 2016 4:30 PM > > > > > > > To: dev@asterixdb.apache.org > > > > > > > Cc: Michael Carey ; Tyson Condie < > > > > > > > tcondie.ucla@gmail.com> > > > > > > > Subject: Re: Time of Multiple Joins in AsterixDB > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Mingda, > > > > > > > > > > > > > > > > > > > > > > > > > > > > It looks that you didn't attach the pdf? > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > Yingyi > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Dec 20, 2016 at 4:15 PM, mingda li < > > limingda1993@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > > Sorry for the wrong version of cc.conf. I convert it to pdf > > version > > > > as > > > > > > > attachment. > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Dec 20, 2016 at 4:06 PM, mingda li < > > limingda1993@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > > Dear all, > > > > > > > > > > > > > > > > > > > > > > > > > > > > I am testing different systems' (AsterixDB, Spark, Hive, Pig) > > > > multiple > > > > > > > joins to see if there is a big difference with different join > > > order. > > > > > This > > > > > > > is the reason for our research on multiple join and the resul= t > > will > > > > > > apppear > > > > > > > in our paper which is to be submitted to VLDB soon. Could you > > help > > > us > > > > > to > > > > > > > make sure that the test results make sense for AsterixDB? > > > > > > > > > > > > > > > > > > > > > > > > > > > > We configure the AsterixDB 0.8.9 ( use > > > asterix-server-0.8.9-SNAPSHOT- > > > > > > binary-assembly) > > > > > > > in our cluster of 16 machines, each with a 3.40GHz i7 process= or > > (4 > > > > > cores > > > > > > > and 2 hyper-threads per core), 32GB of RAM and 1TB of disk > > > capacity. > > > > > The > > > > > > > operating system is 64-bit Ubuntu 12.04. JDK version 1.8.0. > > During > > > > > > > configuration, I follow the NCService instruction here > > > > > > > https://ci.apache.org/projects/asterixdb/ncservice.html. And = I > > set > > > > the > > > > > > > cc.conf as in attachment. (Each node work as nc and the first > > node > > > > also > > > > > > > work as cc). > > > > > > > > > > > > > > > > > > > > > > > > > > > > For experiment, we use 3 fact tables from TPC-DS: inventory; > > > > > > > catalog_sales; catalog_returns with TPC-DS scale factor 1g an= d > > 10g. > > > > The > > > > > > > multiple join query we use in AsterixDB are as following: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Good Join Order: SELECT COUNT(*) FROM (SELECT * FROM > > catalog_sales > > > > cs1 > > > > > > > JOIN catalog_returns cr1 > > > > > > > > > > > > > > ON (cs1.cs_order_number =3D cr1.cr_order_number AND > > cs1.cs_item_sk =3D > > > > > > > cr1.cr_item_sk)) m1 JOIN inventory i1 ON i1.inv_item_sk =3D > > > > > > cs1.cs_item_sk; > > > > > > > > > > > > > > > > > > > > > > > > > > > > Bad Join Order: SELECT COUNT(*) FROM (SELECT * FROM > catalog_sales > > > cs1 > > > > > > JOIN > > > > > > > inventory i1 ON cs1.cs_item_sk =3D i1.inv_item_sk) m1 JOIN > > > > > catalog_returns > > > > > > > cr1 ON (cs1.cs_order_number =3D cr1.cr_order_number AND > > > cs1.cs_item_sk > > > > =3D > > > > > > > cr1.cr_item_sk); > > > > > > > > > > > > > > > > > > > > > > > > > > > > We load the data to AsterixDB firstly and run the two differe= nt > > > > > queries. > > > > > > > (The complete version of all queries for AsterixDB is in > > > attachment) > > > > > We > > > > > > > assume the data has already been stored in AsterixDB and only > > count > > > > the > > > > > > > time for multiple join. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Meanwhile, we use the same dataset and query to test Spark, P= ig > > and > > > > > Hive. > > > > > > > The result is shown in the attachment's figure. And you can > find > > > > > > > AsterixDB's time is always better than others no matter good > or > > > bad > > > > > > > order:-) (BTW, the y scale of figure is time in log scale. Yo= u > > can > > > > see > > > > > > the > > > > > > > time by the label of each bar.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for your help. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Bests, > > > > > > > > > > > > > > Mingda > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --001a1140f75c1b038f054484e18e--