asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yingyi Bu <buyin...@gmail.com>
Subject Re: Time of Multiple Joins in AsterixDB
Date Thu, 22 Dec 2016 19:37:28 GMT
No need to reload data anymore :-)

Best,
Yingyi

On Thu, Dec 22, 2016 at 11:36 AM, Yingyi Bu <buyingyi@gmail.com> wrote:

> Indeed, the change was merged yesterday.
> If you grab the latest master, the computation parallelism can be set by
> the parameter compiler.parallelism:
> -- 0, the default, means to use the storage parallelism as computation
> parallelism
> -- negative value,  means to use all available cores in the cluster
> -- positive value, means to that number of cores, but it will fall to back
> to use all available cores if the number is too large.
>
> Best,
> Yingyi
>
>
> On Thu, Dec 22, 2016 at 10:59 AM, Mike Carey <dtabass@gmail.com> wrote:
>
>> It would definitely also be fun to see how the newest more flexible
>> parallelism stuff handles this!  (Where you'd specify storage parallelism
>> based on drives, and compute parallelism based on cores, both spread across
>> all of the cluster's resources.)
>>
>>
>> On 12/22/16 10:57 AM, Yingyi Bu wrote:
>>
>>> Mingda,
>>>
>>>
>>>       There is one more things that you can do without the need to reload
>>> your data:
>>>
>>>       -- Enlarge the buffercache memory budget to 8GB so that the
>>> datasets
>>> can fit into the buffer cache:
>>>           "storage.buffercache.size": 536870912
>>>           ->
>>>           "storage.buffercache.size": 8589934592
>>>
>>>       You don't need to reload data but only need to restart the
>>> AsterixDB
>>> instance.
>>>       Thanks!
>>>
>>> Best,
>>> Yingyi
>>>
>>>
>>>
>>> On Wed, Dec 21, 2016 at 9:22 PM, Mike Carey <dtabass@gmail.com> wrote:
>>>
>>> Nice!!
>>>>
>>>> On Dec 21, 2016 8:43 PM, "Yingyi Bu" <buyingyi@gmail.com> wrote:
>>>>
>>>> Cool, thanks, Mingda!
>>>>> Look forward to the new numbers!
>>>>>
>>>>> Best,
>>>>> Yingyi
>>>>>
>>>>> On Wed, Dec 21, 2016 at 7:13 PM, mingda li <limingda1993@gmail.com>
>>>>>
>>>> wrote:
>>>>
>>>>> Dear Yingyi,
>>>>>>
>>>>>> Thanks for your suggestion. I have reset the AsterixDB and retest
>>>>>>
>>>>> AsterixDB
>>>>>
>>>>>> using new environment on 100G, 10G. The efficiency for all the tests
>>>>>>
>>>>> (good
>>>>>
>>>>>> and bad order) have been all improved to twice speed.
>>>>>> I will finish all the tests and update the result later.
>>>>>>
>>>>>> Bests,
>>>>>> Mingda
>>>>>>
>>>>>> On Tue, Dec 20, 2016 at 10:20 PM, Yingyi Bu <buyingyi@gmail.com>
>>>>>>
>>>>> wrote:
>>>>
>>>>> Hi Mingda,
>>>>>>>
>>>>>>>       I think that in your setting, a better configuration for
>>>>>>>
>>>>>> AsterixDB
>>>>
>>>>> might be to use 64 partitions, i.e., 4 cores *16.
>>>>>>>       1. To achieve that, you have to have 4 iodevices on each NC,
>>>>>>>
>>>>>> e.g.:
>>>>
>>>>>           iodevices=/home/clash/asterixStorage/asterixdb5/red16
>>>>>>>           -->
>>>>>>>           iodevices=/home/clash/asterixStorage/asterixdb5/red16-1,
>>>>>>> iodevices=/home/clash/asterixStorage/asterixdb5/red16-2,
>>>>>>> iodevices=/home/clash/asterixStorage/asterixdb5/red16-3,
>>>>>>> iodevices=/home/clash/asterixStorage/asterixdb5/red16-4
>>>>>>>
>>>>>>>       2. Assume you have 64 partitions, 31.01G/64 ~= 0.5G.  That
>>>>>>> means
>>>>>>>
>>>>>> you'd
>>>>>>
>>>>>>> better have 512MB memory budget for each joiner so as to make the
>>>>>>>
>>>>>> join
>>>>
>>>>> memory-resident.
>>>>>>>           To achieve that,  in the cc section in the cc.conf, you
>>>>>>>
>>>>>> could
>>>>
>>>>> add:
>>>>>>
>>>>>>>           compiler.joinmemory=536870912
>>>>>>>
>>>>>>>       3. For the JVM setting, 1024MB is too small for the NC.
>>>>>>>           In the shared NC section in cc.conf, you can add:
>>>>>>>           jvm.args=-Xmx16G
>>>>>>>
>>>>>>>       4. For Pig and Hive, you can set the maximum mapper/reducer
>>>>>>>
>>>>>> numbers
>>>>>
>>>>>> in
>>>>>>
>>>>>>> the MapReduce configuration, e.g., at most 4 mappers per machine and
>>>>>>>
>>>>>> at
>>>>
>>>>> most 4 reducers per machine.
>>>>>>>
>>>>>>>       5. I'm not super-familiar with hyper threading, but it might be
>>>>>>>
>>>>>> worth
>>>>>>
>>>>>>> trying 8 partitions per machine, i.e., 128 partitions in total.
>>>>>>>
>>>>>>>       To validate if the new settings work, you can go to the
>>>>>>>
>>>>>> admin/cluster
>>>>>>
>>>>>>> page to double check.
>>>>>>>       Pls keep us updated and let us know if you run into any issue.
>>>>>>>       Thanks!
>>>>>>>
>>>>>>> Best,
>>>>>>> Yingyi
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Dec 20, 2016 at 9:26 PM, mingda li <limingda1993@gmail.com>
>>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> Dear Yingyi,
>>>>>>>> 1. For the returned of  :http://<master node>:19002/admin/cluster
>>>>>>>>
>>>>>>>> {
>>>>>>>>      "cc": {
>>>>>>>>          "configUri": "http://scai01.cs.ucla.edu:
>>>>>>>> 19002/admin/cluster/cc/config",
>>>>>>>>          "statsUri": "http://scai01.cs.ucla.edu:
>>>>>>>> 19002/admin/cluster/cc/stats",
>>>>>>>>          "threadDumpUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/cc/threaddump"
>>>>>>>>      },
>>>>>>>>      "config": {
>>>>>>>>          "api.port": 19002,
>>>>>>>>          "cc.java.opts": "-Xmx1024m",
>>>>>>>>          "cluster.partitions": {
>>>>>>>>              "0": "ID:0, Original Node: red16, IODevice: 0, Active
>>>>>>>>
>>>>>>> Node:
>>>>>
>>>>>> red16",
>>>>>>>>              "1": "ID:1, Original Node: red15, IODevice: 0, Active
>>>>>>>>
>>>>>>> Node:
>>>>>
>>>>>> red15",
>>>>>>>>              "2": "ID:2, Original Node: red14, IODevice: 0, Active
>>>>>>>>
>>>>>>> Node:
>>>>>
>>>>>> red14",
>>>>>>>>              "3": "ID:3, Original Node: red13, IODevice: 0, Active
>>>>>>>>
>>>>>>> Node:
>>>>>
>>>>>> red13",
>>>>>>>>              "4": "ID:4, Original Node: red12, IODevice: 0, Active
>>>>>>>>
>>>>>>> Node:
>>>>>
>>>>>> red12",
>>>>>>>>              "5": "ID:5, Original Node: red11, IODevice: 0, Active
>>>>>>>>
>>>>>>> Node:
>>>>>
>>>>>> red11",
>>>>>>>>              "6": "ID:6, Original Node: red10, IODevice: 0, Active
>>>>>>>>
>>>>>>> Node:
>>>>>
>>>>>> red10",
>>>>>>>>              "7": "ID:7, Original Node: red9, IODevice: 0, Active
>>>>>>>>
>>>>>>> Node:
>>>>>
>>>>>> red9",
>>>>>>>>              "8": "ID:8, Original Node: red8, IODevice: 0, Active
>>>>>>>>
>>>>>>> Node:
>>>>>
>>>>>> red8",
>>>>>>>>              "9": "ID:9, Original Node: red7, IODevice: 0, Active
>>>>>>>>
>>>>>>> Node:
>>>>>
>>>>>> red7",
>>>>>>>>              "10": "ID:10, Original Node: red6, IODevice: 0, Active
>>>>>>>>
>>>>>>> Node:
>>>>>>
>>>>>>> red6",
>>>>>>>>              "11": "ID:11, Original Node: red5, IODevice: 0, Active
>>>>>>>>
>>>>>>> Node:
>>>>>>
>>>>>>> red5",
>>>>>>>>              "12": "ID:12, Original Node: red4, IODevice: 0, Active
>>>>>>>>
>>>>>>> Node:
>>>>>>
>>>>>>> red4",
>>>>>>>>              "13": "ID:13, Original Node: red3, IODevice: 0, Active
>>>>>>>>
>>>>>>> Node:
>>>>>>
>>>>>>> red3",
>>>>>>>>              "14": "ID:14, Original Node: red2, IODevice: 0, Active
>>>>>>>>
>>>>>>> Node:
>>>>>>
>>>>>>> red2",
>>>>>>>>              "15": "ID:15, Original Node: red, IODevice: 0, Active
>>>>>>>>
>>>>>>> Node:
>>>>>
>>>>>> red"
>>>>>>>>          },
>>>>>>>>          "compiler.framesize": 32768,
>>>>>>>>          "compiler.groupmemory": 33554432,
>>>>>>>>          "compiler.joinmemory": 33554432,
>>>>>>>>          "compiler.pregelix.home": "~/pregelix",
>>>>>>>>          "compiler.sortmemory": 33554432,
>>>>>>>>          "core.dump.paths": {
>>>>>>>>              "red": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red/coredump",
>>>>>>
>>>>>>>              "red10": "/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red10/coredump",
>>>>>>>>              "red11": "/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red11/coredump",
>>>>>>>>              "red12": "/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red12/coredump",
>>>>>>>>              "red13": "/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red13/coredump",
>>>>>>>>              "red14": "/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red14/coredump",
>>>>>>>>              "red15": "/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red15/coredump",
>>>>>>>>              "red16": "/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red16/coredump",
>>>>>>>>              "red2": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red2/coredump",
>>>>>>>
>>>>>>>>              "red3": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red3/coredump",
>>>>>>>
>>>>>>>>              "red4": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red4/coredump",
>>>>>>>
>>>>>>>>              "red5": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red5/coredump",
>>>>>>>
>>>>>>>>              "red6": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red6/coredump",
>>>>>>>
>>>>>>>>              "red7": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red7/coredump",
>>>>>>>
>>>>>>>>              "red8": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red8/coredump",
>>>>>>>
>>>>>>>>              "red9": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red9/coredump"
>>>>>>>
>>>>>>>>          },
>>>>>>>>          "feed.central.manager.port": 4500,
>>>>>>>>          "feed.max.threshold.period": 5,
>>>>>>>>          "feed.memory.available.wait.timeout": 10,
>>>>>>>>          "feed.memory.global.budget": 67108864,
>>>>>>>>          "feed.pending.work.threshold": 50,
>>>>>>>>          "feed.port": 19003,
>>>>>>>>          "instance.name": "DEFAULT_INSTANCE",
>>>>>>>>          "log.level": "WARNING",
>>>>>>>>          "max.wait.active.cluster": 60,
>>>>>>>>          "metadata.callback.port": 0,
>>>>>>>>          "metadata.node": "red16",
>>>>>>>>          "metadata.partition": "ID:0, Original Node: red16,
>>>>>>>>
>>>>>>> IODevice:
>>>>
>>>>> 0, Active Node: red16",
>>>>>>>>          "metadata.port": 0,
>>>>>>>>          "metadata.registration.timeout.secs": 60,
>>>>>>>>          "nc.java.opts": "-Xmx1024m",
>>>>>>>>          "node.partitions": {
>>>>>>>>              "red": ["ID:15, Original Node: red, IODevice: 0, Active
>>>>>>>>
>>>>>>> Node:
>>>>>>
>>>>>>> red"],
>>>>>>>>              "red10": ["ID:6, Original Node: red10, IODevice: 0,
>>>>>>>>
>>>>>>> Active
>>>>>
>>>>>> Node: red10"],
>>>>>>>>              "red11": ["ID:5, Original Node: red11, IODevice: 0,
>>>>>>>>
>>>>>>> Active
>>>>>
>>>>>> Node: red11"],
>>>>>>>>              "red12": ["ID:4, Original Node: red12, IODevice: 0,
>>>>>>>>
>>>>>>> Active
>>>>>
>>>>>> Node: red12"],
>>>>>>>>              "red13": ["ID:3, Original Node: red13, IODevice: 0,
>>>>>>>>
>>>>>>> Active
>>>>>
>>>>>> Node: red13"],
>>>>>>>>              "red14": ["ID:2, Original Node: red14, IODevice: 0,
>>>>>>>>
>>>>>>> Active
>>>>>
>>>>>> Node: red14"],
>>>>>>>>              "red15": ["ID:1, Original Node: red15, IODevice: 0,
>>>>>>>>
>>>>>>> Active
>>>>>
>>>>>> Node: red15"],
>>>>>>>>              "red16": ["ID:0, Original Node: red16, IODevice: 0,
>>>>>>>>
>>>>>>> Active
>>>>>
>>>>>> Node: red16"],
>>>>>>>>              "red2": ["ID:14, Original Node: red2, IODevice: 0,
>>>>>>>>
>>>>>>> Active
>>>>
>>>>> Node: red2"],
>>>>>>>>              "red3": ["ID:13, Original Node: red3, IODevice: 0,
>>>>>>>>
>>>>>>> Active
>>>>
>>>>> Node: red3"],
>>>>>>>>              "red4": ["ID:12, Original Node: red4, IODevice: 0,
>>>>>>>>
>>>>>>> Active
>>>>
>>>>> Node: red4"],
>>>>>>>>              "red5": ["ID:11, Original Node: red5, IODevice: 0,
>>>>>>>>
>>>>>>> Active
>>>>
>>>>> Node: red5"],
>>>>>>>>              "red6": ["ID:10, Original Node: red6, IODevice: 0,
>>>>>>>>
>>>>>>> Active
>>>>
>>>>> Node: red6"],
>>>>>>>>              "red7": ["ID:9, Original Node: red7, IODevice: 0,
>>>>>>>>
>>>>>>> Active
>>>>
>>>>> Node: red7"],
>>>>>>>>              "red8": ["ID:8, Original Node: red8, IODevice: 0,
>>>>>>>>
>>>>>>> Active
>>>>
>>>>> Node: red8"],
>>>>>>>>              "red9": ["ID:7, Original Node: red9, IODevice: 0,
>>>>>>>>
>>>>>>> Active
>>>>
>>>>> Node: red9"]
>>>>>>>>          },
>>>>>>>>          "node.stores": {
>>>>>>>>              "red": ["/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red/storage"],
>>>>>>>
>>>>>>>>              "red10": ["/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red10/storage"],
>>>>>>>>              "red11": ["/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red11/storage"],
>>>>>>>>              "red12": ["/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red12/storage"],
>>>>>>>>              "red13": ["/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red13/storage"],
>>>>>>>>              "red14": ["/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red14/storage"],
>>>>>>>>              "red15": ["/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red15/storage"],
>>>>>>>>              "red16": ["/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red16/storage"],
>>>>>>>>              "red2": ["/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red2/storage"],
>>>>>>>>              "red3": ["/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red3/storage"],
>>>>>>>>              "red4": ["/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red4/storage"],
>>>>>>>>              "red5": ["/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red5/storage"],
>>>>>>>>              "red6": ["/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red6/storage"],
>>>>>>>>              "red7": ["/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red7/storage"],
>>>>>>>>              "red8": ["/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red8/storage"],
>>>>>>>>              "red9": ["/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red9/storage"]
>>>>>>>
>>>>>>>>          },
>>>>>>>>          "plot.activate": false,
>>>>>>>>          "replication.enabled": false,
>>>>>>>>          "replication.factor": 2,
>>>>>>>>          "replication.log.batchsize": 4096,
>>>>>>>>          "replication.log.buffer.numpages": 8,
>>>>>>>>          "replication.log.buffer.pagesize": 131072,
>>>>>>>>          "replication.max.remote.recovery.attempts": 5,
>>>>>>>>          "replication.timeout": 30,
>>>>>>>>          "storage.buffercache.maxopenfiles": 2147483647,
>>>>>>>>          "storage.buffercache.pagesize": 131072,
>>>>>>>>          "storage.buffercache.size": 536870912,
>>>>>>>>          "storage.lsm.bloomfilter.falsepositiverate": 0.01,
>>>>>>>>          "storage.memorycomponent.globalbudget": 536870912,
>>>>>>>>          "storage.memorycomponent.numcomponents": 2,
>>>>>>>>          "storage.memorycomponent.numpages": 256,
>>>>>>>>          "storage.memorycomponent.pagesize": 131072,
>>>>>>>>          "storage.metadata.memorycomponent.numpages": 256,
>>>>>>>>          "transaction.log.dirs": {
>>>>>>>>              "red": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red/txnlog",
>>>>>
>>>>>>              "red10": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red10/txnlog",
>>>>>>>
>>>>>>>>              "red11": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red11/txnlog",
>>>>>>>
>>>>>>>>              "red12": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red12/txnlog",
>>>>>>>
>>>>>>>>              "red13": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red13/txnlog",
>>>>>>>
>>>>>>>>              "red14": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red14/txnlog",
>>>>>>>
>>>>>>>>              "red15": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red15/txnlog",
>>>>>>>
>>>>>>>>              "red16": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red16/txnlog",
>>>>>>>
>>>>>>>>              "red2": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red2/txnlog",
>>>>>>
>>>>>>>              "red3": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red3/txnlog",
>>>>>>
>>>>>>>              "red4": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red4/txnlog",
>>>>>>
>>>>>>>              "red5": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red5/txnlog",
>>>>>>
>>>>>>>              "red6": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red6/txnlog",
>>>>>>
>>>>>>>              "red7": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red7/txnlog",
>>>>>>
>>>>>>>              "red8": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red8/txnlog",
>>>>>>
>>>>>>>              "red9": "/home/clash/asterixStorage/
>>>>>>>>
>>>>>>> asterixdb5/red9/txnlog"
>>>>>>
>>>>>>>          },
>>>>>>>>          "txn.commitprofiler.reportinterval": 5,
>>>>>>>>          "txn.job.recovery.memorysize": 67108864,
>>>>>>>>          "txn.lock.escalationthreshold": 1000,
>>>>>>>>          "txn.lock.shrinktimer": 5000,
>>>>>>>>          "txn.lock.timeout.sweepthreshold": 10000,
>>>>>>>>          "txn.lock.timeout.waitthreshold": 60000,
>>>>>>>>          "txn.log.buffer.numpages": 8,
>>>>>>>>          "txn.log.buffer.pagesize": 131072,
>>>>>>>>          "txn.log.checkpoint.history": 0,
>>>>>>>>          "txn.log.checkpoint.lsnthreshold": 67108864,
>>>>>>>>          "txn.log.checkpoint.pollfrequency": 120,
>>>>>>>>          "txn.log.partitionsize": 268435456,
>>>>>>>>          "web.port": 19001,
>>>>>>>>          "web.queryinterface.port": 19006,
>>>>>>>>          "web.secondary.port": 19005
>>>>>>>>      },
>>>>>>>>      "fullShutdownUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/shutdown?all=true",
>>>>>>>>      "metadata_node": "red16",
>>>>>>>>      "ncs": [
>>>>>>>>          {
>>>>>>>>              "configUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/config",
>>>>>>>>              "node_id": "red15",
>>>>>>>>              "partitions": [{
>>>>>>>>                  "active": true,
>>>>>>>>                  "partition_id": "partition_1"
>>>>>>>>              }],
>>>>>>>>              "state": "ACTIVE",
>>>>>>>>              "statsUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/stats",
>>>>>>>>              "threadDumpUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/
>>>>>>>>
>>>>>>> red15/threaddump
>>>>
>>>>> "
>>>>>
>>>>>>          },
>>>>>>>>          {
>>>>>>>>              "configUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/config",
>>>>>>>>              "node_id": "red14",
>>>>>>>>              "partitions": [{
>>>>>>>>                  "active": true,
>>>>>>>>                  "partition_id": "partition_2"
>>>>>>>>              }],
>>>>>>>>              "state": "ACTIVE",
>>>>>>>>              "statsUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/stats",
>>>>>>>>              "threadDumpUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/
>>>>>>>>
>>>>>>> red14/threaddump
>>>>
>>>>> "
>>>>>
>>>>>>          },
>>>>>>>>          {
>>>>>>>>              "configUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/config",
>>>>>>>>              "node_id": "red16",
>>>>>>>>              "partitions": [{
>>>>>>>>                  "active": true,
>>>>>>>>                  "partition_id": "partition_0"
>>>>>>>>              }],
>>>>>>>>              "state": "ACTIVE",
>>>>>>>>              "statsUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/stats",
>>>>>>>>              "threadDumpUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/
>>>>>>>>
>>>>>>> red16/threaddump
>>>>
>>>>> "
>>>>>
>>>>>>          },
>>>>>>>>          {
>>>>>>>>              "configUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/config",
>>>>>>>>              "node_id": "red11",
>>>>>>>>              "partitions": [{
>>>>>>>>                  "active": true,
>>>>>>>>                  "partition_id": "partition_5"
>>>>>>>>              }],
>>>>>>>>              "state": "ACTIVE",
>>>>>>>>              "statsUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/stats",
>>>>>>>>              "threadDumpUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/
>>>>>>>>
>>>>>>> red11/threaddump
>>>>
>>>>> "
>>>>>
>>>>>>          },
>>>>>>>>          {
>>>>>>>>              "configUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/config",
>>>>>>>>              "node_id": "red10",
>>>>>>>>              "partitions": [{
>>>>>>>>                  "active": true,
>>>>>>>>                  "partition_id": "partition_6"
>>>>>>>>              }],
>>>>>>>>              "state": "ACTIVE",
>>>>>>>>              "statsUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/stats",
>>>>>>>>              "threadDumpUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/
>>>>>>>>
>>>>>>> red10/threaddump
>>>>
>>>>> "
>>>>>
>>>>>>          },
>>>>>>>>          {
>>>>>>>>              "configUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/config",
>>>>>>>>              "node_id": "red13",
>>>>>>>>              "partitions": [{
>>>>>>>>                  "active": true,
>>>>>>>>                  "partition_id": "partition_3"
>>>>>>>>              }],
>>>>>>>>              "state": "ACTIVE",
>>>>>>>>              "statsUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/stats",
>>>>>>>>              "threadDumpUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/
>>>>>>>>
>>>>>>> red13/threaddump
>>>>
>>>>> "
>>>>>
>>>>>>          },
>>>>>>>>          {
>>>>>>>>              "configUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/config",
>>>>>>>>              "node_id": "red12",
>>>>>>>>              "partitions": [{
>>>>>>>>                  "active": true,
>>>>>>>>                  "partition_id": "partition_4"
>>>>>>>>              }],
>>>>>>>>              "state": "ACTIVE",
>>>>>>>>              "statsUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/stats",
>>>>>>>>              "threadDumpUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/
>>>>>>>>
>>>>>>> red12/threaddump
>>>>
>>>>> "
>>>>>
>>>>>>          },
>>>>>>>>          {
>>>>>>>>              "configUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/config",
>>>>>>>>              "node_id": "red6",
>>>>>>>>              "partitions": [{
>>>>>>>>                  "active": true,
>>>>>>>>                  "partition_id": "partition_10"
>>>>>>>>              }],
>>>>>>>>              "state": "ACTIVE",
>>>>>>>>              "statsUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/stats",
>>>>>>>>              "threadDumpUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/
>>>>>>>>
>>>>>>> threaddump"
>>>>
>>>>>          },
>>>>>>>>          {
>>>>>>>>              "configUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/config",
>>>>>>>>              "node_id": "red",
>>>>>>>>              "partitions": [{
>>>>>>>>                  "active": true,
>>>>>>>>                  "partition_id": "partition_15"
>>>>>>>>              }],
>>>>>>>>              "state": "ACTIVE",
>>>>>>>>              "statsUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/stats",
>>>>>>>>              "threadDumpUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/threaddump
>>>>>>>>
>>>>>>> "
>>>>
>>>>>          },
>>>>>>>>          {
>>>>>>>>              "configUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/config",
>>>>>>>>              "node_id": "red5",
>>>>>>>>              "partitions": [{
>>>>>>>>                  "active": true,
>>>>>>>>                  "partition_id": "partition_11"
>>>>>>>>              }],
>>>>>>>>              "state": "ACTIVE",
>>>>>>>>              "statsUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/stats",
>>>>>>>>              "threadDumpUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/
>>>>>>>>
>>>>>>> threaddump"
>>>>
>>>>>          },
>>>>>>>>          {
>>>>>>>>              "configUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/config",
>>>>>>>>              "node_id": "red8",
>>>>>>>>              "partitions": [{
>>>>>>>>                  "active": true,
>>>>>>>>                  "partition_id": "partition_8"
>>>>>>>>              }],
>>>>>>>>              "state": "ACTIVE",
>>>>>>>>              "statsUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/stats",
>>>>>>>>              "threadDumpUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/
>>>>>>>>
>>>>>>> threaddump"
>>>>
>>>>>          },
>>>>>>>>          {
>>>>>>>>              "configUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/config",
>>>>>>>>              "node_id": "red7",
>>>>>>>>              "partitions": [{
>>>>>>>>                  "active": true,
>>>>>>>>                  "partition_id": "partition_9"
>>>>>>>>              }],
>>>>>>>>              "state": "ACTIVE",
>>>>>>>>              "statsUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/stats",
>>>>>>>>              "threadDumpUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/
>>>>>>>>
>>>>>>> threaddump"
>>>>
>>>>>          },
>>>>>>>>          {
>>>>>>>>              "configUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/config",
>>>>>>>>              "node_id": "red2",
>>>>>>>>              "partitions": [{
>>>>>>>>                  "active": true,
>>>>>>>>                  "partition_id": "partition_14"
>>>>>>>>              }],
>>>>>>>>              "state": "ACTIVE",
>>>>>>>>              "statsUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/stats",
>>>>>>>>              "threadDumpUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/
>>>>>>>>
>>>>>>> threaddump"
>>>>
>>>>>          },
>>>>>>>>          {
>>>>>>>>              "configUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/config",
>>>>>>>>              "node_id": "red4",
>>>>>>>>              "partitions": [{
>>>>>>>>                  "active": true,
>>>>>>>>                  "partition_id": "partition_12"
>>>>>>>>              }],
>>>>>>>>              "state": "ACTIVE",
>>>>>>>>              "statsUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/stats",
>>>>>>>>              "threadDumpUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/
>>>>>>>>
>>>>>>> threaddump"
>>>>
>>>>>          },
>>>>>>>>          {
>>>>>>>>              "configUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/config",
>>>>>>>>              "node_id": "red3",
>>>>>>>>              "partitions": [{
>>>>>>>>                  "active": true,
>>>>>>>>                  "partition_id": "partition_13"
>>>>>>>>              }],
>>>>>>>>              "state": "ACTIVE",
>>>>>>>>              "statsUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/stats",
>>>>>>>>              "threadDumpUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/
>>>>>>>>
>>>>>>> threaddump"
>>>>
>>>>>          },
>>>>>>>>          {
>>>>>>>>              "configUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/config",
>>>>>>>>              "node_id": "red9",
>>>>>>>>              "partitions": [{
>>>>>>>>                  "active": true,
>>>>>>>>                  "partition_id": "partition_7"
>>>>>>>>              }],
>>>>>>>>              "state": "ACTIVE",
>>>>>>>>              "statsUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/stats",
>>>>>>>>              "threadDumpUri":
>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/
>>>>>>>>
>>>>>>> threaddump"
>>>>
>>>>>          }
>>>>>>>>      ],
>>>>>>>>      "shutdownUri": "http://scai01.cs.ucla.edu:19002/admin/shutdown
>>>>>>>>
>>>>>>> ",
>>>>
>>>>>      "state": "ACTIVE",
>>>>>>>>      "versionUri": "http://scai01.cs.ucla.edu:19002/admin/version"
>>>>>>>> }
>>>>>>>>
>>>>>>>> 2.Catalog_return:2.28G
>>>>>>>>
>>>>>>>> catalog_sales:31.01G
>>>>>>>>
>>>>>>>> inventory:8.63G
>>>>>>>>
>>>>>>>> 3.As for Pig and Hive, I always use the default configuration. I
>>>>>>>>
>>>>>>> didn't
>>>>>
>>>>>> set
>>>>>>>
>>>>>>>> the partition things for them. And for Spark, we use 200
>>>>>>>>
>>>>>>> partitions,
>>>>
>>>>> which
>>>>>>>
>>>>>>>> may be improved and just not bad. For AsterixDB, I also set the
>>>>>>>>
>>>>>>> cluster
>>>>>
>>>>>> using default value of partition and JVM things (I didn't manually
>>>>>>>>
>>>>>>> set
>>>>>
>>>>>> these parameters).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Dec 20, 2016 at 5:58 PM, Yingyi Bu <buyingyi@gmail.com>
>>>>>>>>
>>>>>>> wrote:
>>>>>
>>>>>> Mingda,
>>>>>>>>>
>>>>>>>>>       1. Can you paste the returned JSON of http://<master
>>>>>>>>> node>:19002/admin/cluster at your side? (Pls replace <master
>>>>>>>>>
>>>>>>>> node>
>>>>
>>>>> with
>>>>>>
>>>>>>> the
>>>>>>>>
>>>>>>>>> actual master node name or IP)
>>>>>>>>>       2. Can you list the individual size of each dataset involved
>>>>>>>>>
>>>>>>>> in
>>>>>
>>>>>> the
>>>>>>>
>>>>>>>> query, e.g., catalog_returns, catalog_sales, and inventory?  (I
>>>>>>>>>
>>>>>>>> assume
>>>>>>
>>>>>>> 100GB is the overall size?)
>>>>>>>>>       3. Do Spark/Hive/Pig saturate all CPUs on all machines,
>>>>>>>>>
>>>>>>>> i.e.,
>>>>
>>>>> how
>>>>>>
>>>>>>> many
>>>>>>>>
>>>>>>>>> partitions are running on each machine?  (It seems that your
>>>>>>>>>
>>>>>>>> AsterixDB
>>>>>>
>>>>>>> configuration wouldn't saturate all CPUs for queries --- in the
>>>>>>>>>
>>>>>>>> current
>>>>>>
>>>>>>> AsterixDB master, the computation parallelism is set to be the
>>>>>>>>>
>>>>>>>> same
>>>>
>>>>> as
>>>>>>
>>>>>>> the
>>>>>>>>
>>>>>>>>> storage parallelism (i.e., the number of iodevices on each NC).
>>>>>>>>>
>>>>>>>> I've
>>>>>
>>>>>> submitted a new patch that allow flexible computation
>>>>>>>>>
>>>>>>>> parallelism,
>>>>
>>>>> which
>>>>>>>
>>>>>>>> should be able to get merged into master very soon.)
>>>>>>>>>       Thanks!
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Yingyi
>>>>>>>>>
>>>>>>>>> On Tue, Dec 20, 2016 at 5:44 PM, mingda li <
>>>>>>>>>
>>>>>>>> limingda1993@gmail.com
>>>>
>>>>> wrote:
>>>>>>>>
>>>>>>>>> Oh, sure. When we test the 100G multiple join, we find
>>>>>>>>>>
>>>>>>>>> AsterixDB
>>>>
>>>>> is
>>>>>
>>>>>> slower
>>>>>>>>>
>>>>>>>>>> than Spark (but still faster than Pig and Hive).
>>>>>>>>>> I can share with you the both plots: 1-10G.eps and 1-100G.eps.
>>>>>>>>>>
>>>>>>>>> (We
>>>>>
>>>>>> will
>>>>>>>
>>>>>>>> only use 1-10G.eps in our paper).
>>>>>>>>>> And thanks for Ian's advice:* The dev list generally strips
>>>>>>>>>>
>>>>>>>>> attachments.
>>>>>>>>
>>>>>>>>> Maybe you can just put the config inline? Or link to a
>>>>>>>>>>
>>>>>>>>> pastebin/gist?*
>>>>>>>
>>>>>>>> I know why you can't see the attachments. So I move the plots
>>>>>>>>>>
>>>>>>>>> with
>>>>>
>>>>>> two
>>>>>>>
>>>>>>>> documents to my Dropbox.
>>>>>>>>>> You can find the
>>>>>>>>>> 1-10G.eps here: https://www.dropbox.com/s/
>>>>>>>>>>
>>>>>>>>> rk3xg6gigsfcuyq/1-10G.eps?dl=0
>>>>>>>>
>>>>>>>>> 1-100G.eps here:https://www.dropbox.com/
>>>>>>>>>>
>>>>>>>>> s/tyxnmt6ehau2ski/1-100G.eps
>>>>>>
>>>>>>> ?
>>>>>>>
>>>>>>>> dl=0
>>>>>>>>>
>>>>>>>>>> cc_conf.pdf here: https://www.dropbox.com/s/
>>>>>>>>>>
>>>>>>>>> y3of1s17qdstv5f/cc_conf.pdf?
>>>>>>>>
>>>>>>>>> dl=0
>>>>>>>>>> CompleteQuery.pdf here:
>>>>>>>>>> https://www.dropbox.com/s/lml3fzxfjcmf2c1/CompleteQuery.
>>>>>>>>>>
>>>>>>>>> pdf?dl=0
>>>>
>>>>> On Tue, Dec 20, 2016 at 4:40 PM, Tyson Condie <
>>>>>>>>>>
>>>>>>>>> tcondie.ucla@gmail.com>
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Mingda: Please also share the numbers for 100GB, which show
>>>>>>>>>>>
>>>>>>>>>> AsterixDB
>>>>>>>
>>>>>>>> not
>>>>>>>>>
>>>>>>>>>> quite doing as well as Spark. These 100GB results will not be
>>>>>>>>>>>
>>>>>>>>>> in
>>>>>
>>>>>> our
>>>>>>>
>>>>>>>> submission version, since they’re not needed for the desired
>>>>>>>>>>>
>>>>>>>>>> message:
>>>>>>>
>>>>>>>> picking the right join order matters. Nevertheless, I’d like
>>>>>>>>>>>
>>>>>>>>>> to
>>>>
>>>>> get a
>>>>>>
>>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message