asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yingyi Bu <buyin...@gmail.com>
Subject Re: Time of Multiple Joins in AsterixDB
Date Thu, 22 Dec 2016 19:36:44 GMT
Indeed, the change was merged yesterday.
If you grab the latest master, the computation parallelism can be set by
the parameter compiler.parallelism:
-- 0, the default, means to use the storage parallelism as computation
parallelism
-- negative value,  means to use all available cores in the cluster
-- positive value, means to that number of cores, but it will fall to back
to use all available cores if the number is too large.

Best,
Yingyi


On Thu, Dec 22, 2016 at 10:59 AM, Mike Carey <dtabass@gmail.com> wrote:

> It would definitely also be fun to see how the newest more flexible
> parallelism stuff handles this!  (Where you'd specify storage parallelism
> based on drives, and compute parallelism based on cores, both spread across
> all of the cluster's resources.)
>
>
> On 12/22/16 10:57 AM, Yingyi Bu wrote:
>
>> Mingda,
>>
>>
>>       There is one more things that you can do without the need to reload
>> your data:
>>
>>       -- Enlarge the buffercache memory budget to 8GB so that the datasets
>> can fit into the buffer cache:
>>           "storage.buffercache.size": 536870912
>>           ->
>>           "storage.buffercache.size": 8589934592
>>
>>       You don't need to reload data but only need to restart the AsterixDB
>> instance.
>>       Thanks!
>>
>> Best,
>> Yingyi
>>
>>
>>
>> On Wed, Dec 21, 2016 at 9:22 PM, Mike Carey <dtabass@gmail.com> wrote:
>>
>> Nice!!
>>>
>>> On Dec 21, 2016 8:43 PM, "Yingyi Bu" <buyingyi@gmail.com> wrote:
>>>
>>> Cool, thanks, Mingda!
>>>> Look forward to the new numbers!
>>>>
>>>> Best,
>>>> Yingyi
>>>>
>>>> On Wed, Dec 21, 2016 at 7:13 PM, mingda li <limingda1993@gmail.com>
>>>>
>>> wrote:
>>>
>>>> Dear Yingyi,
>>>>>
>>>>> Thanks for your suggestion. I have reset the AsterixDB and retest
>>>>>
>>>> AsterixDB
>>>>
>>>>> using new environment on 100G, 10G. The efficiency for all the tests
>>>>>
>>>> (good
>>>>
>>>>> and bad order) have been all improved to twice speed.
>>>>> I will finish all the tests and update the result later.
>>>>>
>>>>> Bests,
>>>>> Mingda
>>>>>
>>>>> On Tue, Dec 20, 2016 at 10:20 PM, Yingyi Bu <buyingyi@gmail.com>
>>>>>
>>>> wrote:
>>>
>>>> Hi Mingda,
>>>>>>
>>>>>>       I think that in your setting, a better configuration for
>>>>>>
>>>>> AsterixDB
>>>
>>>> might be to use 64 partitions, i.e., 4 cores *16.
>>>>>>       1. To achieve that, you have to have 4 iodevices on each NC,
>>>>>>
>>>>> e.g.:
>>>
>>>>           iodevices=/home/clash/asterixStorage/asterixdb5/red16
>>>>>>           -->
>>>>>>           iodevices=/home/clash/asterixStorage/asterixdb5/red16-1,
>>>>>> iodevices=/home/clash/asterixStorage/asterixdb5/red16-2,
>>>>>> iodevices=/home/clash/asterixStorage/asterixdb5/red16-3,
>>>>>> iodevices=/home/clash/asterixStorage/asterixdb5/red16-4
>>>>>>
>>>>>>       2. Assume you have 64 partitions, 31.01G/64 ~= 0.5G.  That
means
>>>>>>
>>>>> you'd
>>>>>
>>>>>> better have 512MB memory budget for each joiner so as to make the
>>>>>>
>>>>> join
>>>
>>>> memory-resident.
>>>>>>           To achieve that,  in the cc section in the cc.conf, you
>>>>>>
>>>>> could
>>>
>>>> add:
>>>>>
>>>>>>           compiler.joinmemory=536870912
>>>>>>
>>>>>>       3. For the JVM setting, 1024MB is too small for the NC.
>>>>>>           In the shared NC section in cc.conf, you can add:
>>>>>>           jvm.args=-Xmx16G
>>>>>>
>>>>>>       4. For Pig and Hive, you can set the maximum mapper/reducer
>>>>>>
>>>>> numbers
>>>>
>>>>> in
>>>>>
>>>>>> the MapReduce configuration, e.g., at most 4 mappers per machine
and
>>>>>>
>>>>> at
>>>
>>>> most 4 reducers per machine.
>>>>>>
>>>>>>       5. I'm not super-familiar with hyper threading, but it might
be
>>>>>>
>>>>> worth
>>>>>
>>>>>> trying 8 partitions per machine, i.e., 128 partitions in total.
>>>>>>
>>>>>>       To validate if the new settings work, you can go to the
>>>>>>
>>>>> admin/cluster
>>>>>
>>>>>> page to double check.
>>>>>>       Pls keep us updated and let us know if you run into any issue.
>>>>>>       Thanks!
>>>>>>
>>>>>> Best,
>>>>>> Yingyi
>>>>>>
>>>>>>
>>>>>> On Tue, Dec 20, 2016 at 9:26 PM, mingda li <limingda1993@gmail.com>
>>>>>>
>>>>> wrote:
>>>>>
>>>>>> Dear Yingyi,
>>>>>>> 1. For the returned of  :http://<master node>:19002/admin/cluster
>>>>>>>
>>>>>>> {
>>>>>>>      "cc": {
>>>>>>>          "configUri": "http://scai01.cs.ucla.edu:
>>>>>>> 19002/admin/cluster/cc/config",
>>>>>>>          "statsUri": "http://scai01.cs.ucla.edu:
>>>>>>> 19002/admin/cluster/cc/stats",
>>>>>>>          "threadDumpUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/cc/threaddump"
>>>>>>>      },
>>>>>>>      "config": {
>>>>>>>          "api.port": 19002,
>>>>>>>          "cc.java.opts": "-Xmx1024m",
>>>>>>>          "cluster.partitions": {
>>>>>>>              "0": "ID:0, Original Node: red16, IODevice: 0, Active
>>>>>>>
>>>>>> Node:
>>>>
>>>>> red16",
>>>>>>>              "1": "ID:1, Original Node: red15, IODevice: 0, Active
>>>>>>>
>>>>>> Node:
>>>>
>>>>> red15",
>>>>>>>              "2": "ID:2, Original Node: red14, IODevice: 0, Active
>>>>>>>
>>>>>> Node:
>>>>
>>>>> red14",
>>>>>>>              "3": "ID:3, Original Node: red13, IODevice: 0, Active
>>>>>>>
>>>>>> Node:
>>>>
>>>>> red13",
>>>>>>>              "4": "ID:4, Original Node: red12, IODevice: 0, Active
>>>>>>>
>>>>>> Node:
>>>>
>>>>> red12",
>>>>>>>              "5": "ID:5, Original Node: red11, IODevice: 0, Active
>>>>>>>
>>>>>> Node:
>>>>
>>>>> red11",
>>>>>>>              "6": "ID:6, Original Node: red10, IODevice: 0, Active
>>>>>>>
>>>>>> Node:
>>>>
>>>>> red10",
>>>>>>>              "7": "ID:7, Original Node: red9, IODevice: 0, Active
>>>>>>>
>>>>>> Node:
>>>>
>>>>> red9",
>>>>>>>              "8": "ID:8, Original Node: red8, IODevice: 0, Active
>>>>>>>
>>>>>> Node:
>>>>
>>>>> red8",
>>>>>>>              "9": "ID:9, Original Node: red7, IODevice: 0, Active
>>>>>>>
>>>>>> Node:
>>>>
>>>>> red7",
>>>>>>>              "10": "ID:10, Original Node: red6, IODevice: 0,
Active
>>>>>>>
>>>>>> Node:
>>>>>
>>>>>> red6",
>>>>>>>              "11": "ID:11, Original Node: red5, IODevice: 0,
Active
>>>>>>>
>>>>>> Node:
>>>>>
>>>>>> red5",
>>>>>>>              "12": "ID:12, Original Node: red4, IODevice: 0,
Active
>>>>>>>
>>>>>> Node:
>>>>>
>>>>>> red4",
>>>>>>>              "13": "ID:13, Original Node: red3, IODevice: 0,
Active
>>>>>>>
>>>>>> Node:
>>>>>
>>>>>> red3",
>>>>>>>              "14": "ID:14, Original Node: red2, IODevice: 0,
Active
>>>>>>>
>>>>>> Node:
>>>>>
>>>>>> red2",
>>>>>>>              "15": "ID:15, Original Node: red, IODevice: 0, Active
>>>>>>>
>>>>>> Node:
>>>>
>>>>> red"
>>>>>>>          },
>>>>>>>          "compiler.framesize": 32768,
>>>>>>>          "compiler.groupmemory": 33554432,
>>>>>>>          "compiler.joinmemory": 33554432,
>>>>>>>          "compiler.pregelix.home": "~/pregelix",
>>>>>>>          "compiler.sortmemory": 33554432,
>>>>>>>          "core.dump.paths": {
>>>>>>>              "red": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red/coredump",
>>>>>
>>>>>>              "red10": "/home/clash/asterixStorage/
>>>>>>> asterixdb5/red10/coredump",
>>>>>>>              "red11": "/home/clash/asterixStorage/
>>>>>>> asterixdb5/red11/coredump",
>>>>>>>              "red12": "/home/clash/asterixStorage/
>>>>>>> asterixdb5/red12/coredump",
>>>>>>>              "red13": "/home/clash/asterixStorage/
>>>>>>> asterixdb5/red13/coredump",
>>>>>>>              "red14": "/home/clash/asterixStorage/
>>>>>>> asterixdb5/red14/coredump",
>>>>>>>              "red15": "/home/clash/asterixStorage/
>>>>>>> asterixdb5/red15/coredump",
>>>>>>>              "red16": "/home/clash/asterixStorage/
>>>>>>> asterixdb5/red16/coredump",
>>>>>>>              "red2": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red2/coredump",
>>>>>>
>>>>>>>              "red3": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red3/coredump",
>>>>>>
>>>>>>>              "red4": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red4/coredump",
>>>>>>
>>>>>>>              "red5": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red5/coredump",
>>>>>>
>>>>>>>              "red6": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red6/coredump",
>>>>>>
>>>>>>>              "red7": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red7/coredump",
>>>>>>
>>>>>>>              "red8": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red8/coredump",
>>>>>>
>>>>>>>              "red9": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red9/coredump"
>>>>>>
>>>>>>>          },
>>>>>>>          "feed.central.manager.port": 4500,
>>>>>>>          "feed.max.threshold.period": 5,
>>>>>>>          "feed.memory.available.wait.timeout": 10,
>>>>>>>          "feed.memory.global.budget": 67108864,
>>>>>>>          "feed.pending.work.threshold": 50,
>>>>>>>          "feed.port": 19003,
>>>>>>>          "instance.name": "DEFAULT_INSTANCE",
>>>>>>>          "log.level": "WARNING",
>>>>>>>          "max.wait.active.cluster": 60,
>>>>>>>          "metadata.callback.port": 0,
>>>>>>>          "metadata.node": "red16",
>>>>>>>          "metadata.partition": "ID:0, Original Node: red16,
>>>>>>>
>>>>>> IODevice:
>>>
>>>> 0, Active Node: red16",
>>>>>>>          "metadata.port": 0,
>>>>>>>          "metadata.registration.timeout.secs": 60,
>>>>>>>          "nc.java.opts": "-Xmx1024m",
>>>>>>>          "node.partitions": {
>>>>>>>              "red": ["ID:15, Original Node: red, IODevice: 0,
Active
>>>>>>>
>>>>>> Node:
>>>>>
>>>>>> red"],
>>>>>>>              "red10": ["ID:6, Original Node: red10, IODevice:
0,
>>>>>>>
>>>>>> Active
>>>>
>>>>> Node: red10"],
>>>>>>>              "red11": ["ID:5, Original Node: red11, IODevice:
0,
>>>>>>>
>>>>>> Active
>>>>
>>>>> Node: red11"],
>>>>>>>              "red12": ["ID:4, Original Node: red12, IODevice:
0,
>>>>>>>
>>>>>> Active
>>>>
>>>>> Node: red12"],
>>>>>>>              "red13": ["ID:3, Original Node: red13, IODevice:
0,
>>>>>>>
>>>>>> Active
>>>>
>>>>> Node: red13"],
>>>>>>>              "red14": ["ID:2, Original Node: red14, IODevice:
0,
>>>>>>>
>>>>>> Active
>>>>
>>>>> Node: red14"],
>>>>>>>              "red15": ["ID:1, Original Node: red15, IODevice:
0,
>>>>>>>
>>>>>> Active
>>>>
>>>>> Node: red15"],
>>>>>>>              "red16": ["ID:0, Original Node: red16, IODevice:
0,
>>>>>>>
>>>>>> Active
>>>>
>>>>> Node: red16"],
>>>>>>>              "red2": ["ID:14, Original Node: red2, IODevice:
0,
>>>>>>>
>>>>>> Active
>>>
>>>> Node: red2"],
>>>>>>>              "red3": ["ID:13, Original Node: red3, IODevice:
0,
>>>>>>>
>>>>>> Active
>>>
>>>> Node: red3"],
>>>>>>>              "red4": ["ID:12, Original Node: red4, IODevice:
0,
>>>>>>>
>>>>>> Active
>>>
>>>> Node: red4"],
>>>>>>>              "red5": ["ID:11, Original Node: red5, IODevice:
0,
>>>>>>>
>>>>>> Active
>>>
>>>> Node: red5"],
>>>>>>>              "red6": ["ID:10, Original Node: red6, IODevice:
0,
>>>>>>>
>>>>>> Active
>>>
>>>> Node: red6"],
>>>>>>>              "red7": ["ID:9, Original Node: red7, IODevice: 0,
>>>>>>>
>>>>>> Active
>>>
>>>> Node: red7"],
>>>>>>>              "red8": ["ID:8, Original Node: red8, IODevice: 0,
>>>>>>>
>>>>>> Active
>>>
>>>> Node: red8"],
>>>>>>>              "red9": ["ID:7, Original Node: red9, IODevice: 0,
>>>>>>>
>>>>>> Active
>>>
>>>> Node: red9"]
>>>>>>>          },
>>>>>>>          "node.stores": {
>>>>>>>              "red": ["/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red/storage"],
>>>>>>
>>>>>>>              "red10": ["/home/clash/asterixStorage/
>>>>>>> asterixdb5/red10/storage"],
>>>>>>>              "red11": ["/home/clash/asterixStorage/
>>>>>>> asterixdb5/red11/storage"],
>>>>>>>              "red12": ["/home/clash/asterixStorage/
>>>>>>> asterixdb5/red12/storage"],
>>>>>>>              "red13": ["/home/clash/asterixStorage/
>>>>>>> asterixdb5/red13/storage"],
>>>>>>>              "red14": ["/home/clash/asterixStorage/
>>>>>>> asterixdb5/red14/storage"],
>>>>>>>              "red15": ["/home/clash/asterixStorage/
>>>>>>> asterixdb5/red15/storage"],
>>>>>>>              "red16": ["/home/clash/asterixStorage/
>>>>>>> asterixdb5/red16/storage"],
>>>>>>>              "red2": ["/home/clash/asterixStorage/
>>>>>>> asterixdb5/red2/storage"],
>>>>>>>              "red3": ["/home/clash/asterixStorage/
>>>>>>> asterixdb5/red3/storage"],
>>>>>>>              "red4": ["/home/clash/asterixStorage/
>>>>>>> asterixdb5/red4/storage"],
>>>>>>>              "red5": ["/home/clash/asterixStorage/
>>>>>>> asterixdb5/red5/storage"],
>>>>>>>              "red6": ["/home/clash/asterixStorage/
>>>>>>> asterixdb5/red6/storage"],
>>>>>>>              "red7": ["/home/clash/asterixStorage/
>>>>>>> asterixdb5/red7/storage"],
>>>>>>>              "red8": ["/home/clash/asterixStorage/
>>>>>>> asterixdb5/red8/storage"],
>>>>>>>              "red9": ["/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red9/storage"]
>>>>>>
>>>>>>>          },
>>>>>>>          "plot.activate": false,
>>>>>>>          "replication.enabled": false,
>>>>>>>          "replication.factor": 2,
>>>>>>>          "replication.log.batchsize": 4096,
>>>>>>>          "replication.log.buffer.numpages": 8,
>>>>>>>          "replication.log.buffer.pagesize": 131072,
>>>>>>>          "replication.max.remote.recovery.attempts": 5,
>>>>>>>          "replication.timeout": 30,
>>>>>>>          "storage.buffercache.maxopenfiles": 2147483647,
>>>>>>>          "storage.buffercache.pagesize": 131072,
>>>>>>>          "storage.buffercache.size": 536870912,
>>>>>>>          "storage.lsm.bloomfilter.falsepositiverate": 0.01,
>>>>>>>          "storage.memorycomponent.globalbudget": 536870912,
>>>>>>>          "storage.memorycomponent.numcomponents": 2,
>>>>>>>          "storage.memorycomponent.numpages": 256,
>>>>>>>          "storage.memorycomponent.pagesize": 131072,
>>>>>>>          "storage.metadata.memorycomponent.numpages": 256,
>>>>>>>          "transaction.log.dirs": {
>>>>>>>              "red": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red/txnlog",
>>>>
>>>>>              "red10": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red10/txnlog",
>>>>>>
>>>>>>>              "red11": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red11/txnlog",
>>>>>>
>>>>>>>              "red12": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red12/txnlog",
>>>>>>
>>>>>>>              "red13": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red13/txnlog",
>>>>>>
>>>>>>>              "red14": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red14/txnlog",
>>>>>>
>>>>>>>              "red15": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red15/txnlog",
>>>>>>
>>>>>>>              "red16": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red16/txnlog",
>>>>>>
>>>>>>>              "red2": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red2/txnlog",
>>>>>
>>>>>>              "red3": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red3/txnlog",
>>>>>
>>>>>>              "red4": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red4/txnlog",
>>>>>
>>>>>>              "red5": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red5/txnlog",
>>>>>
>>>>>>              "red6": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red6/txnlog",
>>>>>
>>>>>>              "red7": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red7/txnlog",
>>>>>
>>>>>>              "red8": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red8/txnlog",
>>>>>
>>>>>>              "red9": "/home/clash/asterixStorage/
>>>>>>>
>>>>>> asterixdb5/red9/txnlog"
>>>>>
>>>>>>          },
>>>>>>>          "txn.commitprofiler.reportinterval": 5,
>>>>>>>          "txn.job.recovery.memorysize": 67108864,
>>>>>>>          "txn.lock.escalationthreshold": 1000,
>>>>>>>          "txn.lock.shrinktimer": 5000,
>>>>>>>          "txn.lock.timeout.sweepthreshold": 10000,
>>>>>>>          "txn.lock.timeout.waitthreshold": 60000,
>>>>>>>          "txn.log.buffer.numpages": 8,
>>>>>>>          "txn.log.buffer.pagesize": 131072,
>>>>>>>          "txn.log.checkpoint.history": 0,
>>>>>>>          "txn.log.checkpoint.lsnthreshold": 67108864,
>>>>>>>          "txn.log.checkpoint.pollfrequency": 120,
>>>>>>>          "txn.log.partitionsize": 268435456,
>>>>>>>          "web.port": 19001,
>>>>>>>          "web.queryinterface.port": 19006,
>>>>>>>          "web.secondary.port": 19005
>>>>>>>      },
>>>>>>>      "fullShutdownUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/shutdown?all=true",
>>>>>>>      "metadata_node": "red16",
>>>>>>>      "ncs": [
>>>>>>>          {
>>>>>>>              "configUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/config",
>>>>>>>              "node_id": "red15",
>>>>>>>              "partitions": [{
>>>>>>>                  "active": true,
>>>>>>>                  "partition_id": "partition_1"
>>>>>>>              }],
>>>>>>>              "state": "ACTIVE",
>>>>>>>              "statsUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/stats",
>>>>>>>              "threadDumpUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/
>>>>>>>
>>>>>> red15/threaddump
>>>
>>>> "
>>>>
>>>>>          },
>>>>>>>          {
>>>>>>>              "configUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/config",
>>>>>>>              "node_id": "red14",
>>>>>>>              "partitions": [{
>>>>>>>                  "active": true,
>>>>>>>                  "partition_id": "partition_2"
>>>>>>>              }],
>>>>>>>              "state": "ACTIVE",
>>>>>>>              "statsUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/stats",
>>>>>>>              "threadDumpUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/
>>>>>>>
>>>>>> red14/threaddump
>>>
>>>> "
>>>>
>>>>>          },
>>>>>>>          {
>>>>>>>              "configUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/config",
>>>>>>>              "node_id": "red16",
>>>>>>>              "partitions": [{
>>>>>>>                  "active": true,
>>>>>>>                  "partition_id": "partition_0"
>>>>>>>              }],
>>>>>>>              "state": "ACTIVE",
>>>>>>>              "statsUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/stats",
>>>>>>>              "threadDumpUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/
>>>>>>>
>>>>>> red16/threaddump
>>>
>>>> "
>>>>
>>>>>          },
>>>>>>>          {
>>>>>>>              "configUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/config",
>>>>>>>              "node_id": "red11",
>>>>>>>              "partitions": [{
>>>>>>>                  "active": true,
>>>>>>>                  "partition_id": "partition_5"
>>>>>>>              }],
>>>>>>>              "state": "ACTIVE",
>>>>>>>              "statsUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/stats",
>>>>>>>              "threadDumpUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/
>>>>>>>
>>>>>> red11/threaddump
>>>
>>>> "
>>>>
>>>>>          },
>>>>>>>          {
>>>>>>>              "configUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/config",
>>>>>>>              "node_id": "red10",
>>>>>>>              "partitions": [{
>>>>>>>                  "active": true,
>>>>>>>                  "partition_id": "partition_6"
>>>>>>>              }],
>>>>>>>              "state": "ACTIVE",
>>>>>>>              "statsUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/stats",
>>>>>>>              "threadDumpUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/
>>>>>>>
>>>>>> red10/threaddump
>>>
>>>> "
>>>>
>>>>>          },
>>>>>>>          {
>>>>>>>              "configUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/config",
>>>>>>>              "node_id": "red13",
>>>>>>>              "partitions": [{
>>>>>>>                  "active": true,
>>>>>>>                  "partition_id": "partition_3"
>>>>>>>              }],
>>>>>>>              "state": "ACTIVE",
>>>>>>>              "statsUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/stats",
>>>>>>>              "threadDumpUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/
>>>>>>>
>>>>>> red13/threaddump
>>>
>>>> "
>>>>
>>>>>          },
>>>>>>>          {
>>>>>>>              "configUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/config",
>>>>>>>              "node_id": "red12",
>>>>>>>              "partitions": [{
>>>>>>>                  "active": true,
>>>>>>>                  "partition_id": "partition_4"
>>>>>>>              }],
>>>>>>>              "state": "ACTIVE",
>>>>>>>              "statsUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/stats",
>>>>>>>              "threadDumpUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/
>>>>>>>
>>>>>> red12/threaddump
>>>
>>>> "
>>>>
>>>>>          },
>>>>>>>          {
>>>>>>>              "configUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/config",
>>>>>>>              "node_id": "red6",
>>>>>>>              "partitions": [{
>>>>>>>                  "active": true,
>>>>>>>                  "partition_id": "partition_10"
>>>>>>>              }],
>>>>>>>              "state": "ACTIVE",
>>>>>>>              "statsUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/stats",
>>>>>>>              "threadDumpUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/
>>>>>>>
>>>>>> threaddump"
>>>
>>>>          },
>>>>>>>          {
>>>>>>>              "configUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/config",
>>>>>>>              "node_id": "red",
>>>>>>>              "partitions": [{
>>>>>>>                  "active": true,
>>>>>>>                  "partition_id": "partition_15"
>>>>>>>              }],
>>>>>>>              "state": "ACTIVE",
>>>>>>>              "statsUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/stats",
>>>>>>>              "threadDumpUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/threaddump
>>>>>>>
>>>>>> "
>>>
>>>>          },
>>>>>>>          {
>>>>>>>              "configUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/config",
>>>>>>>              "node_id": "red5",
>>>>>>>              "partitions": [{
>>>>>>>                  "active": true,
>>>>>>>                  "partition_id": "partition_11"
>>>>>>>              }],
>>>>>>>              "state": "ACTIVE",
>>>>>>>              "statsUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/stats",
>>>>>>>              "threadDumpUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/
>>>>>>>
>>>>>> threaddump"
>>>
>>>>          },
>>>>>>>          {
>>>>>>>              "configUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/config",
>>>>>>>              "node_id": "red8",
>>>>>>>              "partitions": [{
>>>>>>>                  "active": true,
>>>>>>>                  "partition_id": "partition_8"
>>>>>>>              }],
>>>>>>>              "state": "ACTIVE",
>>>>>>>              "statsUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/stats",
>>>>>>>              "threadDumpUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/
>>>>>>>
>>>>>> threaddump"
>>>
>>>>          },
>>>>>>>          {
>>>>>>>              "configUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/config",
>>>>>>>              "node_id": "red7",
>>>>>>>              "partitions": [{
>>>>>>>                  "active": true,
>>>>>>>                  "partition_id": "partition_9"
>>>>>>>              }],
>>>>>>>              "state": "ACTIVE",
>>>>>>>              "statsUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/stats",
>>>>>>>              "threadDumpUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/
>>>>>>>
>>>>>> threaddump"
>>>
>>>>          },
>>>>>>>          {
>>>>>>>              "configUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/config",
>>>>>>>              "node_id": "red2",
>>>>>>>              "partitions": [{
>>>>>>>                  "active": true,
>>>>>>>                  "partition_id": "partition_14"
>>>>>>>              }],
>>>>>>>              "state": "ACTIVE",
>>>>>>>              "statsUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/stats",
>>>>>>>              "threadDumpUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/
>>>>>>>
>>>>>> threaddump"
>>>
>>>>          },
>>>>>>>          {
>>>>>>>              "configUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/config",
>>>>>>>              "node_id": "red4",
>>>>>>>              "partitions": [{
>>>>>>>                  "active": true,
>>>>>>>                  "partition_id": "partition_12"
>>>>>>>              }],
>>>>>>>              "state": "ACTIVE",
>>>>>>>              "statsUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/stats",
>>>>>>>              "threadDumpUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/
>>>>>>>
>>>>>> threaddump"
>>>
>>>>          },
>>>>>>>          {
>>>>>>>              "configUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/config",
>>>>>>>              "node_id": "red3",
>>>>>>>              "partitions": [{
>>>>>>>                  "active": true,
>>>>>>>                  "partition_id": "partition_13"
>>>>>>>              }],
>>>>>>>              "state": "ACTIVE",
>>>>>>>              "statsUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/stats",
>>>>>>>              "threadDumpUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/
>>>>>>>
>>>>>> threaddump"
>>>
>>>>          },
>>>>>>>          {
>>>>>>>              "configUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/config",
>>>>>>>              "node_id": "red9",
>>>>>>>              "partitions": [{
>>>>>>>                  "active": true,
>>>>>>>                  "partition_id": "partition_7"
>>>>>>>              }],
>>>>>>>              "state": "ACTIVE",
>>>>>>>              "statsUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/stats",
>>>>>>>              "threadDumpUri":
>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/
>>>>>>>
>>>>>> threaddump"
>>>
>>>>          }
>>>>>>>      ],
>>>>>>>      "shutdownUri": "http://scai01.cs.ucla.edu:19002/admin/shutdown
>>>>>>>
>>>>>> ",
>>>
>>>>      "state": "ACTIVE",
>>>>>>>      "versionUri": "http://scai01.cs.ucla.edu:19002/admin/version"
>>>>>>> }
>>>>>>>
>>>>>>> 2.Catalog_return:2.28G
>>>>>>>
>>>>>>> catalog_sales:31.01G
>>>>>>>
>>>>>>> inventory:8.63G
>>>>>>>
>>>>>>> 3.As for Pig and Hive, I always use the default configuration.
I
>>>>>>>
>>>>>> didn't
>>>>
>>>>> set
>>>>>>
>>>>>>> the partition things for them. And for Spark, we use 200
>>>>>>>
>>>>>> partitions,
>>>
>>>> which
>>>>>>
>>>>>>> may be improved and just not bad. For AsterixDB, I also set the
>>>>>>>
>>>>>> cluster
>>>>
>>>>> using default value of partition and JVM things (I didn't manually
>>>>>>>
>>>>>> set
>>>>
>>>>> these parameters).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Dec 20, 2016 at 5:58 PM, Yingyi Bu <buyingyi@gmail.com>
>>>>>>>
>>>>>> wrote:
>>>>
>>>>> Mingda,
>>>>>>>>
>>>>>>>>       1. Can you paste the returned JSON of http://<master
>>>>>>>> node>:19002/admin/cluster at your side? (Pls replace <master
>>>>>>>>
>>>>>>> node>
>>>
>>>> with
>>>>>
>>>>>> the
>>>>>>>
>>>>>>>> actual master node name or IP)
>>>>>>>>       2. Can you list the individual size of each dataset
involved
>>>>>>>>
>>>>>>> in
>>>>
>>>>> the
>>>>>>
>>>>>>> query, e.g., catalog_returns, catalog_sales, and inventory? 
(I
>>>>>>>>
>>>>>>> assume
>>>>>
>>>>>> 100GB is the overall size?)
>>>>>>>>       3. Do Spark/Hive/Pig saturate all CPUs on all machines,
>>>>>>>>
>>>>>>> i.e.,
>>>
>>>> how
>>>>>
>>>>>> many
>>>>>>>
>>>>>>>> partitions are running on each machine?  (It seems that your
>>>>>>>>
>>>>>>> AsterixDB
>>>>>
>>>>>> configuration wouldn't saturate all CPUs for queries --- in the
>>>>>>>>
>>>>>>> current
>>>>>
>>>>>> AsterixDB master, the computation parallelism is set to be the
>>>>>>>>
>>>>>>> same
>>>
>>>> as
>>>>>
>>>>>> the
>>>>>>>
>>>>>>>> storage parallelism (i.e., the number of iodevices on each
NC).
>>>>>>>>
>>>>>>> I've
>>>>
>>>>> submitted a new patch that allow flexible computation
>>>>>>>>
>>>>>>> parallelism,
>>>
>>>> which
>>>>>>
>>>>>>> should be able to get merged into master very soon.)
>>>>>>>>       Thanks!
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Yingyi
>>>>>>>>
>>>>>>>> On Tue, Dec 20, 2016 at 5:44 PM, mingda li <
>>>>>>>>
>>>>>>> limingda1993@gmail.com
>>>
>>>> wrote:
>>>>>>>
>>>>>>>> Oh, sure. When we test the 100G multiple join, we find
>>>>>>>>>
>>>>>>>> AsterixDB
>>>
>>>> is
>>>>
>>>>> slower
>>>>>>>>
>>>>>>>>> than Spark (but still faster than Pig and Hive).
>>>>>>>>> I can share with you the both plots: 1-10G.eps and 1-100G.eps.
>>>>>>>>>
>>>>>>>> (We
>>>>
>>>>> will
>>>>>>
>>>>>>> only use 1-10G.eps in our paper).
>>>>>>>>> And thanks for Ian's advice:* The dev list generally
strips
>>>>>>>>>
>>>>>>>> attachments.
>>>>>>>
>>>>>>>> Maybe you can just put the config inline? Or link to a
>>>>>>>>>
>>>>>>>> pastebin/gist?*
>>>>>>
>>>>>>> I know why you can't see the attachments. So I move the plots
>>>>>>>>>
>>>>>>>> with
>>>>
>>>>> two
>>>>>>
>>>>>>> documents to my Dropbox.
>>>>>>>>> You can find the
>>>>>>>>> 1-10G.eps here: https://www.dropbox.com/s/
>>>>>>>>>
>>>>>>>> rk3xg6gigsfcuyq/1-10G.eps?dl=0
>>>>>>>
>>>>>>>> 1-100G.eps here:https://www.dropbox.com/
>>>>>>>>>
>>>>>>>> s/tyxnmt6ehau2ski/1-100G.eps
>>>>>
>>>>>> ?
>>>>>>
>>>>>>> dl=0
>>>>>>>>
>>>>>>>>> cc_conf.pdf here: https://www.dropbox.com/s/
>>>>>>>>>
>>>>>>>> y3of1s17qdstv5f/cc_conf.pdf?
>>>>>>>
>>>>>>>> dl=0
>>>>>>>>> CompleteQuery.pdf here:
>>>>>>>>> https://www.dropbox.com/s/lml3fzxfjcmf2c1/CompleteQuery.
>>>>>>>>>
>>>>>>>> pdf?dl=0
>>>
>>>> On Tue, Dec 20, 2016 at 4:40 PM, Tyson Condie <
>>>>>>>>>
>>>>>>>> tcondie.ucla@gmail.com>
>>>>>>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Mingda: Please also share the numbers for 100GB, which
show
>>>>>>>>>>
>>>>>>>>> AsterixDB
>>>>>>
>>>>>>> not
>>>>>>>>
>>>>>>>>> quite doing as well as Spark. These 100GB results will
not be
>>>>>>>>>>
>>>>>>>>> in
>>>>
>>>>> our
>>>>>>
>>>>>>> submission version, since they’re not needed for the desired
>>>>>>>>>>
>>>>>>>>> message:
>>>>>>
>>>>>>> picking the right join order matters. Nevertheless, I’d like
>>>>>>>>>>
>>>>>>>>> to
>>>
>>>> get a
>>>>>
>>>>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message