asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wail Alkowaileet <wael....@gmail.com>
Subject Re: Time of Multiple Joins in AsterixDB
Date Wed, 21 Dec 2016 06:36:20 GMT
Hi Mingda,

What Spark version did you use for your benchmark and what's the
SPARK_SLAVE_MEMORY budget ? Also what is the file format for Spark (Parquet
or CSV)?

On Wed, Dec 21, 2016 at 9:20 AM, Yingyi Bu <buyingyi@gmail.com> wrote:

> Hi Mingda,
>
>      I think that in your setting, a better configuration for AsterixDB
> might be to use 64 partitions, i.e., 4 cores *16.
>      1. To achieve that, you have to have 4 iodevices on each NC, e.g.:
>          iodevices=/home/clash/asterixStorage/asterixdb5/red16
>          -->
>          iodevices=/home/clash/asterixStorage/asterixdb5/red16-1,
> iodevices=/home/clash/asterixStorage/asterixdb5/red16-2,
> iodevices=/home/clash/asterixStorage/asterixdb5/red16-3,
> iodevices=/home/clash/asterixStorage/asterixdb5/red16-4
>
>      2. Assume you have 64 partitions, 31.01G/64 ~= 0.5G.  That means you'd
> better have 512MB memory budget for each joiner so as to make the join
> memory-resident.
>          To achieve that,  in the cc section in the cc.conf, you could add:
>          compiler.joinmemory=536870912
>
>      3. For the JVM setting, 1024MB is too small for the NC.
>          In the shared NC section in cc.conf, you can add:
>          jvm.args=-Xmx16G
>
>      4. For Pig and Hive, you can set the maximum mapper/reducer numbers in
> the MapReduce configuration, e.g., at most 4 mappers per machine and at
> most 4 reducers per machine.
>
>      5. I'm not super-familiar with hyper threading, but it might be worth
> trying 8 partitions per machine, i.e., 128 partitions in total.
>
>      To validate if the new settings work, you can go to the admin/cluster
> page to double check.
>      Pls keep us updated and let us know if you run into any issue.
>      Thanks!
>
> Best,
> Yingyi
>
>
> On Tue, Dec 20, 2016 at 9:26 PM, mingda li <limingda1993@gmail.com> wrote:
>
> > Dear Yingyi,
> > 1. For the returned of  :http://<master node>:19002/admin/cluster
> >
> > {
> >     "cc": {
> >         "configUri": "http://scai01.cs.ucla.edu:
> > 19002/admin/cluster/cc/config",
> >         "statsUri": "http://scai01.cs.ucla.edu:
> > 19002/admin/cluster/cc/stats",
> >         "threadDumpUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/cc/threaddump"
> >     },
> >     "config": {
> >         "api.port": 19002,
> >         "cc.java.opts": "-Xmx1024m",
> >         "cluster.partitions": {
> >             "0": "ID:0, Original Node: red16, IODevice: 0, Active Node:
> > red16",
> >             "1": "ID:1, Original Node: red15, IODevice: 0, Active Node:
> > red15",
> >             "2": "ID:2, Original Node: red14, IODevice: 0, Active Node:
> > red14",
> >             "3": "ID:3, Original Node: red13, IODevice: 0, Active Node:
> > red13",
> >             "4": "ID:4, Original Node: red12, IODevice: 0, Active Node:
> > red12",
> >             "5": "ID:5, Original Node: red11, IODevice: 0, Active Node:
> > red11",
> >             "6": "ID:6, Original Node: red10, IODevice: 0, Active Node:
> > red10",
> >             "7": "ID:7, Original Node: red9, IODevice: 0, Active Node:
> > red9",
> >             "8": "ID:8, Original Node: red8, IODevice: 0, Active Node:
> > red8",
> >             "9": "ID:9, Original Node: red7, IODevice: 0, Active Node:
> > red7",
> >             "10": "ID:10, Original Node: red6, IODevice: 0, Active Node:
> > red6",
> >             "11": "ID:11, Original Node: red5, IODevice: 0, Active Node:
> > red5",
> >             "12": "ID:12, Original Node: red4, IODevice: 0, Active Node:
> > red4",
> >             "13": "ID:13, Original Node: red3, IODevice: 0, Active Node:
> > red3",
> >             "14": "ID:14, Original Node: red2, IODevice: 0, Active Node:
> > red2",
> >             "15": "ID:15, Original Node: red, IODevice: 0, Active Node:
> > red"
> >         },
> >         "compiler.framesize": 32768,
> >         "compiler.groupmemory": 33554432,
> >         "compiler.joinmemory": 33554432,
> >         "compiler.pregelix.home": "~/pregelix",
> >         "compiler.sortmemory": 33554432,
> >         "core.dump.paths": {
> >             "red": "/home/clash/asterixStorage/asterixdb5/red/coredump",
> >             "red10": "/home/clash/asterixStorage/
> > asterixdb5/red10/coredump",
> >             "red11": "/home/clash/asterixStorage/
> > asterixdb5/red11/coredump",
> >             "red12": "/home/clash/asterixStorage/
> > asterixdb5/red12/coredump",
> >             "red13": "/home/clash/asterixStorage/
> > asterixdb5/red13/coredump",
> >             "red14": "/home/clash/asterixStorage/
> > asterixdb5/red14/coredump",
> >             "red15": "/home/clash/asterixStorage/
> > asterixdb5/red15/coredump",
> >             "red16": "/home/clash/asterixStorage/
> > asterixdb5/red16/coredump",
> >             "red2": "/home/clash/asterixStorage/
> asterixdb5/red2/coredump",
> >             "red3": "/home/clash/asterixStorage/
> asterixdb5/red3/coredump",
> >             "red4": "/home/clash/asterixStorage/
> asterixdb5/red4/coredump",
> >             "red5": "/home/clash/asterixStorage/
> asterixdb5/red5/coredump",
> >             "red6": "/home/clash/asterixStorage/
> asterixdb5/red6/coredump",
> >             "red7": "/home/clash/asterixStorage/
> asterixdb5/red7/coredump",
> >             "red8": "/home/clash/asterixStorage/
> asterixdb5/red8/coredump",
> >             "red9": "/home/clash/asterixStorage/
> asterixdb5/red9/coredump"
> >         },
> >         "feed.central.manager.port": 4500,
> >         "feed.max.threshold.period": 5,
> >         "feed.memory.available.wait.timeout": 10,
> >         "feed.memory.global.budget": 67108864,
> >         "feed.pending.work.threshold": 50,
> >         "feed.port": 19003,
> >         "instance.name": "DEFAULT_INSTANCE",
> >         "log.level": "WARNING",
> >         "max.wait.active.cluster": 60,
> >         "metadata.callback.port": 0,
> >         "metadata.node": "red16",
> >         "metadata.partition": "ID:0, Original Node: red16, IODevice:
> > 0, Active Node: red16",
> >         "metadata.port": 0,
> >         "metadata.registration.timeout.secs": 60,
> >         "nc.java.opts": "-Xmx1024m",
> >         "node.partitions": {
> >             "red": ["ID:15, Original Node: red, IODevice: 0, Active Node:
> > red"],
> >             "red10": ["ID:6, Original Node: red10, IODevice: 0, Active
> > Node: red10"],
> >             "red11": ["ID:5, Original Node: red11, IODevice: 0, Active
> > Node: red11"],
> >             "red12": ["ID:4, Original Node: red12, IODevice: 0, Active
> > Node: red12"],
> >             "red13": ["ID:3, Original Node: red13, IODevice: 0, Active
> > Node: red13"],
> >             "red14": ["ID:2, Original Node: red14, IODevice: 0, Active
> > Node: red14"],
> >             "red15": ["ID:1, Original Node: red15, IODevice: 0, Active
> > Node: red15"],
> >             "red16": ["ID:0, Original Node: red16, IODevice: 0, Active
> > Node: red16"],
> >             "red2": ["ID:14, Original Node: red2, IODevice: 0, Active
> > Node: red2"],
> >             "red3": ["ID:13, Original Node: red3, IODevice: 0, Active
> > Node: red3"],
> >             "red4": ["ID:12, Original Node: red4, IODevice: 0, Active
> > Node: red4"],
> >             "red5": ["ID:11, Original Node: red5, IODevice: 0, Active
> > Node: red5"],
> >             "red6": ["ID:10, Original Node: red6, IODevice: 0, Active
> > Node: red6"],
> >             "red7": ["ID:9, Original Node: red7, IODevice: 0, Active
> > Node: red7"],
> >             "red8": ["ID:8, Original Node: red8, IODevice: 0, Active
> > Node: red8"],
> >             "red9": ["ID:7, Original Node: red9, IODevice: 0, Active
> > Node: red9"]
> >         },
> >         "node.stores": {
> >             "red": ["/home/clash/asterixStorage/
> asterixdb5/red/storage"],
> >             "red10": ["/home/clash/asterixStorage/
> > asterixdb5/red10/storage"],
> >             "red11": ["/home/clash/asterixStorage/
> > asterixdb5/red11/storage"],
> >             "red12": ["/home/clash/asterixStorage/
> > asterixdb5/red12/storage"],
> >             "red13": ["/home/clash/asterixStorage/
> > asterixdb5/red13/storage"],
> >             "red14": ["/home/clash/asterixStorage/
> > asterixdb5/red14/storage"],
> >             "red15": ["/home/clash/asterixStorage/
> > asterixdb5/red15/storage"],
> >             "red16": ["/home/clash/asterixStorage/
> > asterixdb5/red16/storage"],
> >             "red2": ["/home/clash/asterixStorage/
> > asterixdb5/red2/storage"],
> >             "red3": ["/home/clash/asterixStorage/
> > asterixdb5/red3/storage"],
> >             "red4": ["/home/clash/asterixStorage/
> > asterixdb5/red4/storage"],
> >             "red5": ["/home/clash/asterixStorage/
> > asterixdb5/red5/storage"],
> >             "red6": ["/home/clash/asterixStorage/
> > asterixdb5/red6/storage"],
> >             "red7": ["/home/clash/asterixStorage/
> > asterixdb5/red7/storage"],
> >             "red8": ["/home/clash/asterixStorage/
> > asterixdb5/red8/storage"],
> >             "red9": ["/home/clash/asterixStorage/
> asterixdb5/red9/storage"]
> >         },
> >         "plot.activate": false,
> >         "replication.enabled": false,
> >         "replication.factor": 2,
> >         "replication.log.batchsize": 4096,
> >         "replication.log.buffer.numpages": 8,
> >         "replication.log.buffer.pagesize": 131072,
> >         "replication.max.remote.recovery.attempts": 5,
> >         "replication.timeout": 30,
> >         "storage.buffercache.maxopenfiles": 2147483647,
> >         "storage.buffercache.pagesize": 131072,
> >         "storage.buffercache.size": 536870912,
> >         "storage.lsm.bloomfilter.falsepositiverate": 0.01,
> >         "storage.memorycomponent.globalbudget": 536870912,
> >         "storage.memorycomponent.numcomponents": 2,
> >         "storage.memorycomponent.numpages": 256,
> >         "storage.memorycomponent.pagesize": 131072,
> >         "storage.metadata.memorycomponent.numpages": 256,
> >         "transaction.log.dirs": {
> >             "red": "/home/clash/asterixStorage/asterixdb5/red/txnlog",
> >             "red10": "/home/clash/asterixStorage/
> asterixdb5/red10/txnlog",
> >             "red11": "/home/clash/asterixStorage/
> asterixdb5/red11/txnlog",
> >             "red12": "/home/clash/asterixStorage/
> asterixdb5/red12/txnlog",
> >             "red13": "/home/clash/asterixStorage/
> asterixdb5/red13/txnlog",
> >             "red14": "/home/clash/asterixStorage/
> asterixdb5/red14/txnlog",
> >             "red15": "/home/clash/asterixStorage/
> asterixdb5/red15/txnlog",
> >             "red16": "/home/clash/asterixStorage/
> asterixdb5/red16/txnlog",
> >             "red2": "/home/clash/asterixStorage/asterixdb5/red2/txnlog",
> >             "red3": "/home/clash/asterixStorage/asterixdb5/red3/txnlog",
> >             "red4": "/home/clash/asterixStorage/asterixdb5/red4/txnlog",
> >             "red5": "/home/clash/asterixStorage/asterixdb5/red5/txnlog",
> >             "red6": "/home/clash/asterixStorage/asterixdb5/red6/txnlog",
> >             "red7": "/home/clash/asterixStorage/asterixdb5/red7/txnlog",
> >             "red8": "/home/clash/asterixStorage/asterixdb5/red8/txnlog",
> >             "red9": "/home/clash/asterixStorage/asterixdb5/red9/txnlog"
> >         },
> >         "txn.commitprofiler.reportinterval": 5,
> >         "txn.job.recovery.memorysize": 67108864,
> >         "txn.lock.escalationthreshold": 1000,
> >         "txn.lock.shrinktimer": 5000,
> >         "txn.lock.timeout.sweepthreshold": 10000,
> >         "txn.lock.timeout.waitthreshold": 60000,
> >         "txn.log.buffer.numpages": 8,
> >         "txn.log.buffer.pagesize": 131072,
> >         "txn.log.checkpoint.history": 0,
> >         "txn.log.checkpoint.lsnthreshold": 67108864,
> >         "txn.log.checkpoint.pollfrequency": 120,
> >         "txn.log.partitionsize": 268435456,
> >         "web.port": 19001,
> >         "web.queryinterface.port": 19006,
> >         "web.secondary.port": 19005
> >     },
> >     "fullShutdownUri":
> > "http://scai01.cs.ucla.edu:19002/admin/shutdown?all=true",
> >     "metadata_node": "red16",
> >     "ncs": [
> >         {
> >             "configUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/config",
> >             "node_id": "red15",
> >             "partitions": [{
> >                 "active": true,
> >                 "partition_id": "partition_1"
> >             }],
> >             "state": "ACTIVE",
> >             "statsUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/stats",
> >             "threadDumpUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/threaddump"
> >         },
> >         {
> >             "configUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/config",
> >             "node_id": "red14",
> >             "partitions": [{
> >                 "active": true,
> >                 "partition_id": "partition_2"
> >             }],
> >             "state": "ACTIVE",
> >             "statsUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/stats",
> >             "threadDumpUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/threaddump"
> >         },
> >         {
> >             "configUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/config",
> >             "node_id": "red16",
> >             "partitions": [{
> >                 "active": true,
> >                 "partition_id": "partition_0"
> >             }],
> >             "state": "ACTIVE",
> >             "statsUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/stats",
> >             "threadDumpUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/threaddump"
> >         },
> >         {
> >             "configUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/config",
> >             "node_id": "red11",
> >             "partitions": [{
> >                 "active": true,
> >                 "partition_id": "partition_5"
> >             }],
> >             "state": "ACTIVE",
> >             "statsUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/stats",
> >             "threadDumpUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/threaddump"
> >         },
> >         {
> >             "configUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/config",
> >             "node_id": "red10",
> >             "partitions": [{
> >                 "active": true,
> >                 "partition_id": "partition_6"
> >             }],
> >             "state": "ACTIVE",
> >             "statsUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/stats",
> >             "threadDumpUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/threaddump"
> >         },
> >         {
> >             "configUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/config",
> >             "node_id": "red13",
> >             "partitions": [{
> >                 "active": true,
> >                 "partition_id": "partition_3"
> >             }],
> >             "state": "ACTIVE",
> >             "statsUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/stats",
> >             "threadDumpUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/threaddump"
> >         },
> >         {
> >             "configUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/config",
> >             "node_id": "red12",
> >             "partitions": [{
> >                 "active": true,
> >                 "partition_id": "partition_4"
> >             }],
> >             "state": "ACTIVE",
> >             "statsUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/stats",
> >             "threadDumpUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/threaddump"
> >         },
> >         {
> >             "configUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/config",
> >             "node_id": "red6",
> >             "partitions": [{
> >                 "active": true,
> >                 "partition_id": "partition_10"
> >             }],
> >             "state": "ACTIVE",
> >             "statsUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/stats",
> >             "threadDumpUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/threaddump"
> >         },
> >         {
> >             "configUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/config",
> >             "node_id": "red",
> >             "partitions": [{
> >                 "active": true,
> >                 "partition_id": "partition_15"
> >             }],
> >             "state": "ACTIVE",
> >             "statsUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/stats",
> >             "threadDumpUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/threaddump"
> >         },
> >         {
> >             "configUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/config",
> >             "node_id": "red5",
> >             "partitions": [{
> >                 "active": true,
> >                 "partition_id": "partition_11"
> >             }],
> >             "state": "ACTIVE",
> >             "statsUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/stats",
> >             "threadDumpUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/threaddump"
> >         },
> >         {
> >             "configUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/config",
> >             "node_id": "red8",
> >             "partitions": [{
> >                 "active": true,
> >                 "partition_id": "partition_8"
> >             }],
> >             "state": "ACTIVE",
> >             "statsUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/stats",
> >             "threadDumpUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/threaddump"
> >         },
> >         {
> >             "configUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/config",
> >             "node_id": "red7",
> >             "partitions": [{
> >                 "active": true,
> >                 "partition_id": "partition_9"
> >             }],
> >             "state": "ACTIVE",
> >             "statsUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/stats",
> >             "threadDumpUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/threaddump"
> >         },
> >         {
> >             "configUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/config",
> >             "node_id": "red2",
> >             "partitions": [{
> >                 "active": true,
> >                 "partition_id": "partition_14"
> >             }],
> >             "state": "ACTIVE",
> >             "statsUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/stats",
> >             "threadDumpUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/threaddump"
> >         },
> >         {
> >             "configUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/config",
> >             "node_id": "red4",
> >             "partitions": [{
> >                 "active": true,
> >                 "partition_id": "partition_12"
> >             }],
> >             "state": "ACTIVE",
> >             "statsUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/stats",
> >             "threadDumpUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/threaddump"
> >         },
> >         {
> >             "configUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/config",
> >             "node_id": "red3",
> >             "partitions": [{
> >                 "active": true,
> >                 "partition_id": "partition_13"
> >             }],
> >             "state": "ACTIVE",
> >             "statsUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/stats",
> >             "threadDumpUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/threaddump"
> >         },
> >         {
> >             "configUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/config",
> >             "node_id": "red9",
> >             "partitions": [{
> >                 "active": true,
> >                 "partition_id": "partition_7"
> >             }],
> >             "state": "ACTIVE",
> >             "statsUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/stats",
> >             "threadDumpUri":
> > "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/threaddump"
> >         }
> >     ],
> >     "shutdownUri": "http://scai01.cs.ucla.edu:19002/admin/shutdown",
> >     "state": "ACTIVE",
> >     "versionUri": "http://scai01.cs.ucla.edu:19002/admin/version"
> > }
> >
> > 2.Catalog_return:2.28G
> >
> > catalog_sales:31.01G
> >
> > inventory:8.63G
> >
> > 3.As for Pig and Hive, I always use the default configuration. I didn't
> set
> > the partition things for them. And for Spark, we use 200 partitions,
> which
> > may be improved and just not bad. For AsterixDB, I also set the cluster
> > using default value of partition and JVM things (I didn't manually set
> > these parameters).
> >
> >
> >
> > On Tue, Dec 20, 2016 at 5:58 PM, Yingyi Bu <buyingyi@gmail.com> wrote:
> >
> > > Mingda,
> > >
> > >      1. Can you paste the returned JSON of http://<master
> > > node>:19002/admin/cluster at your side? (Pls replace <master node>
with
> > the
> > > actual master node name or IP)
> > >      2. Can you list the individual size of each dataset involved in
> the
> > > query, e.g., catalog_returns, catalog_sales, and inventory?  (I assume
> > > 100GB is the overall size?)
> > >      3. Do Spark/Hive/Pig saturate all CPUs on all machines, i.e., how
> > many
> > > partitions are running on each machine?  (It seems that your AsterixDB
> > > configuration wouldn't saturate all CPUs for queries --- in the current
> > > AsterixDB master, the computation parallelism is set to be the same as
> > the
> > > storage parallelism (i.e., the number of iodevices on each NC). I've
> > > submitted a new patch that allow flexible computation parallelism,
> which
> > > should be able to get merged into master very soon.)
> > >      Thanks!
> > >
> > > Best,
> > > Yingyi
> > >
> > > On Tue, Dec 20, 2016 at 5:44 PM, mingda li <limingda1993@gmail.com>
> > wrote:
> > >
> > > > Oh, sure. When we test the 100G multiple join, we find AsterixDB is
> > > slower
> > > > than Spark (but still faster than Pig and Hive).
> > > > I can share with you the both plots: 1-10G.eps and 1-100G.eps. (We
> will
> > > > only use 1-10G.eps in our paper).
> > > > And thanks for Ian's advice:* The dev list generally strips
> > attachments.
> > > > Maybe you can just put the config inline? Or link to a
> pastebin/gist?*
> > > > I know why you can't see the attachments. So I move the plots with
> two
> > > > documents to my Dropbox.
> > > > You can find the
> > > > 1-10G.eps here: https://www.dropbox.com/s/
> > rk3xg6gigsfcuyq/1-10G.eps?dl=0
> > > > 1-100G.eps here:https://www.dropbox.com/s/tyxnmt6ehau2ski/1-100G.eps
> ?
> > > dl=0
> > > > cc_conf.pdf here: https://www.dropbox.com/s/
> > y3of1s17qdstv5f/cc_conf.pdf?
> > > > dl=0
> > > > CompleteQuery.pdf here:
> > > > https://www.dropbox.com/s/lml3fzxfjcmf2c1/CompleteQuery.pdf?dl=0
> > > >
> > > > On Tue, Dec 20, 2016 at 4:40 PM, Tyson Condie <
> tcondie.ucla@gmail.com>
> > > > wrote:
> > > >
> > > > > Mingda: Please also share the numbers for 100GB, which show
> AsterixDB
> > > not
> > > > > quite doing as well as Spark. These 100GB results will not be in
> our
> > > > > submission version, since they’re not needed for the desired
> message:
> > > > > picking the right join order matters. Nevertheless, I’d like to
> get a
> > > > > better understanding of what’s going on in the larger dataset
> regime.
> > > > >
> > > > >
> > > > >
> > > > > -Tyson
> > > > >
> > > > >
> > > > >
> > > > > From: Yingyi Bu [mailto:buyingyi@gmail.com]
> > > > > Sent: Tuesday, December 20, 2016 4:30 PM
> > > > > To: dev@asterixdb.apache.org
> > > > > Cc: Michael Carey <mjcarey@ics.uci.edu>; Tyson Condie <
> > > > > tcondie.ucla@gmail.com>
> > > > > Subject: Re: Time of Multiple Joins in AsterixDB
> > > > >
> > > > >
> > > > >
> > > > > Hi Mingda,
> > > > >
> > > > >
> > > > >
> > > > >      It looks that you didn't attach the pdf?
> > > > >
> > > > >      Thanks!
> > > > >
> > > > >
> > > > >
> > > > > Best,
> > > > >
> > > > > Yingyi
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Dec 20, 2016 at 4:15 PM, mingda li <limingda1993@gmail.com
> > > > > <mailto:limingda1993@gmail.com> > wrote:
> > > > >
> > > > > Sorry for the wrong version of cc.conf. I convert it to pdf version
> > as
> > > > > attachment.
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Dec 20, 2016 at 4:06 PM, mingda li <limingda1993@gmail.com
> > > > > <mailto:limingda1993@gmail.com> > wrote:
> > > > >
> > > > > Dear all,
> > > > >
> > > > >
> > > > >
> > > > > I am testing different systems' (AsterixDB, Spark, Hive, Pig)
> > multiple
> > > > > joins to see if there is a big difference with different join
> order.
> > > This
> > > > > is the reason for our research on multiple join and the result will
> > > > apppear
> > > > > in our paper which is to be submitted to VLDB soon. Could you help
> us
> > > to
> > > > > make sure that the test results make sense for AsterixDB?
> > > > >
> > > > >
> > > > >
> > > > > We configure the AsterixDB 0.8.9 ( use
> asterix-server-0.8.9-SNAPSHOT-
> > > > binary-assembly)
> > > > > in our cluster of 16 machines, each with a 3.40GHz i7 processor (4
> > > cores
> > > > > and 2 hyper-threads per core), 32GB of RAM and 1TB of disk
> capacity.
> > > The
> > > > > operating system is 64-bit Ubuntu 12.04. JDK version 1.8.0. During
> > > > > configuration, I follow the NCService instruction here
> > > > > https://ci.apache.org/projects/asterixdb/ncservice.html. And I set
> > the
> > > > > cc.conf as in attachment. (Each node work as nc and the first node
> > also
> > > > > work as cc).
> > > > >
> > > > >
> > > > >
> > > > > For experiment, we use 3 fact tables from TPC-DS: inventory;
> > > > > catalog_sales; catalog_returns with TPC-DS scale factor 1g and 10g.
> > The
> > > > > multiple join query we use in AsterixDB are as following:
> > > > >
> > > > >
> > > > >
> > > > > Good Join Order: SELECT COUNT(*) FROM (SELECT * FROM catalog_sales
> > cs1
> > > > > JOIN catalog_returns cr1
> > > > >
> > > > >  ON (cs1.cs_order_number = cr1.cr_order_number AND cs1.cs_item_sk
=
> > > > > cr1.cr_item_sk))  m1 JOIN inventory i1 ON i1.inv_item_sk =
> > > > cs1.cs_item_sk;
> > > > >
> > > > >
> > > > >
> > > > > Bad Join Order: SELECT COUNT(*) FROM (SELECT * FROM catalog_sales
> cs1
> > > > JOIN
> > > > > inventory i1 ON cs1.cs_item_sk = i1.inv_item_sk) m1 JOIN
> > > catalog_returns
> > > > > cr1 ON (cs1.cs_order_number = cr1.cr_order_number AND
> cs1.cs_item_sk
> > =
> > > > > cr1.cr_item_sk);
> > > > >
> > > > >
> > > > >
> > > > > We load the data to AsterixDB firstly and run the two different
> > > queries.
> > > > > (The complete version of all queries for AsterixDB is in
> attachment)
> > > We
> > > > > assume the data has already been stored in AsterixDB and only count
> > the
> > > > > time for multiple join.
> > > > >
> > > > >
> > > > >
> > > > > Meanwhile, we use the same dataset and query to test Spark, Pig and
> > > Hive.
> > > > > The result is shown in the attachment's figure. And you can find
> > > > > AsterixDB's time is always better than others  no matter good or
> bad
> > > > > order:-) (BTW, the y scale of figure is time in log scale. You can
> > see
> > > > the
> > > > > time by the label of each bar.)
> > > > >
> > > > >
> > > > >
> > > > > Thanks for your help.
> > > > >
> > > > >
> > > > >
> > > > > Bests,
> > > > >
> > > > > Mingda
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>



-- 

*Regards,*
Wail Alkowaileet

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message