asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: Time of Multiple Joins in AsterixDB
Date Thu, 22 Dec 2016 20:40:32 GMT
So to use maxi-storage parallelism for scans but maxi-core parallelism 
subsequently, one would just specify a large positive value >= the 
number of available cores? (E.g., 10000)


On 12/22/16 11:37 AM, Yingyi Bu wrote:
> No need to reload data anymore :-)
>
> Best,
> Yingyi
>
> On Thu, Dec 22, 2016 at 11:36 AM, Yingyi Bu <buyingyi@gmail.com> wrote:
>
>> Indeed, the change was merged yesterday.
>> If you grab the latest master, the computation parallelism can be set by
>> the parameter compiler.parallelism:
>> -- 0, the default, means to use the storage parallelism as computation
>> parallelism
>> -- negative value,  means to use all available cores in the cluster
>> -- positive value, means to that number of cores, but it will fall to back
>> to use all available cores if the number is too large.
>>
>> Best,
>> Yingyi
>>
>>
>> On Thu, Dec 22, 2016 at 10:59 AM, Mike Carey <dtabass@gmail.com> wrote:
>>
>>> It would definitely also be fun to see how the newest more flexible
>>> parallelism stuff handles this!  (Where you'd specify storage parallelism
>>> based on drives, and compute parallelism based on cores, both spread across
>>> all of the cluster's resources.)
>>>
>>>
>>> On 12/22/16 10:57 AM, Yingyi Bu wrote:
>>>
>>>> Mingda,
>>>>
>>>>
>>>>        There is one more things that you can do without the need to reload
>>>> your data:
>>>>
>>>>        -- Enlarge the buffercache memory budget to 8GB so that the
>>>> datasets
>>>> can fit into the buffer cache:
>>>>            "storage.buffercache.size": 536870912
>>>>            ->
>>>>            "storage.buffercache.size": 8589934592
>>>>
>>>>        You don't need to reload data but only need to restart the
>>>> AsterixDB
>>>> instance.
>>>>        Thanks!
>>>>
>>>> Best,
>>>> Yingyi
>>>>
>>>>
>>>>
>>>> On Wed, Dec 21, 2016 at 9:22 PM, Mike Carey <dtabass@gmail.com> wrote:
>>>>
>>>> Nice!!
>>>>> On Dec 21, 2016 8:43 PM, "Yingyi Bu" <buyingyi@gmail.com> wrote:
>>>>>
>>>>> Cool, thanks, Mingda!
>>>>>> Look forward to the new numbers!
>>>>>>
>>>>>> Best,
>>>>>> Yingyi
>>>>>>
>>>>>> On Wed, Dec 21, 2016 at 7:13 PM, mingda li <limingda1993@gmail.com>
>>>>>>
>>>>> wrote:
>>>>>
>>>>>> Dear Yingyi,
>>>>>>> Thanks for your suggestion. I have reset the AsterixDB and retest
>>>>>>>
>>>>>> AsterixDB
>>>>>>
>>>>>>> using new environment on 100G, 10G. The efficiency for all the tests
>>>>>>>
>>>>>> (good
>>>>>>
>>>>>>> and bad order) have been all improved to twice speed.
>>>>>>> I will finish all the tests and update the result later.
>>>>>>>
>>>>>>> Bests,
>>>>>>> Mingda
>>>>>>>
>>>>>>> On Tue, Dec 20, 2016 at 10:20 PM, Yingyi Bu <buyingyi@gmail.com>
>>>>>>>
>>>>>> wrote:
>>>>>> Hi Mingda,
>>>>>>>>        I think that in your setting, a better configuration for
>>>>>>>>
>>>>>>> AsterixDB
>>>>>> might be to use 64 partitions, i.e., 4 cores *16.
>>>>>>>>        1. To achieve that, you have to have 4 iodevices on each NC,
>>>>>>>>
>>>>>>> e.g.:
>>>>>>            iodevices=/home/clash/asterixStorage/asterixdb5/red16
>>>>>>>>            -->
>>>>>>>>            iodevices=/home/clash/asterixStorage/asterixdb5/red16-1,
>>>>>>>> iodevices=/home/clash/asterixStorage/asterixdb5/red16-2,
>>>>>>>> iodevices=/home/clash/asterixStorage/asterixdb5/red16-3,
>>>>>>>> iodevices=/home/clash/asterixStorage/asterixdb5/red16-4
>>>>>>>>
>>>>>>>>        2. Assume you have 64 partitions, 31.01G/64 ~= 0.5G.  That
>>>>>>>> means
>>>>>>>>
>>>>>>> you'd
>>>>>>>
>>>>>>>> better have 512MB memory budget for each joiner so as to make the
>>>>>>>>
>>>>>>> join
>>>>>> memory-resident.
>>>>>>>>            To achieve that,  in the cc section in the cc.conf, you
>>>>>>>>
>>>>>>> could
>>>>>> add:
>>>>>>>>            compiler.joinmemory=536870912
>>>>>>>>
>>>>>>>>        3. For the JVM setting, 1024MB is too small for the NC.
>>>>>>>>            In the shared NC section in cc.conf, you can add:
>>>>>>>>            jvm.args=-Xmx16G
>>>>>>>>
>>>>>>>>        4. For Pig and Hive, you can set the maximum mapper/reducer
>>>>>>>>
>>>>>>> numbers
>>>>>>> in
>>>>>>>
>>>>>>>> the MapReduce configuration, e.g., at most 4 mappers per machine and
>>>>>>>>
>>>>>>> at
>>>>>> most 4 reducers per machine.
>>>>>>>>        5. I'm not super-familiar with hyper threading, but it might be
>>>>>>>>
>>>>>>> worth
>>>>>>>
>>>>>>>> trying 8 partitions per machine, i.e., 128 partitions in total.
>>>>>>>>
>>>>>>>>        To validate if the new settings work, you can go to the
>>>>>>>>
>>>>>>> admin/cluster
>>>>>>>
>>>>>>>> page to double check.
>>>>>>>>        Pls keep us updated and let us know if you run into any issue.
>>>>>>>>        Thanks!
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Yingyi
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Dec 20, 2016 at 9:26 PM, mingda li <limingda1993@gmail.com>
>>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Dear Yingyi,
>>>>>>>>> 1. For the returned of  :http://<master node>:19002/admin/cluster
>>>>>>>>>
>>>>>>>>> {
>>>>>>>>>       "cc": {
>>>>>>>>>           "configUri": "http://scai01.cs.ucla.edu:
>>>>>>>>> 19002/admin/cluster/cc/config",
>>>>>>>>>           "statsUri": "http://scai01.cs.ucla.edu:
>>>>>>>>> 19002/admin/cluster/cc/stats",
>>>>>>>>>           "threadDumpUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/cc/threaddump"
>>>>>>>>>       },
>>>>>>>>>       "config": {
>>>>>>>>>           "api.port": 19002,
>>>>>>>>>           "cc.java.opts": "-Xmx1024m",
>>>>>>>>>           "cluster.partitions": {
>>>>>>>>>               "0": "ID:0, Original Node: red16, IODevice: 0, Active
>>>>>>>>>
>>>>>>>> Node:
>>>>>>> red16",
>>>>>>>>>               "1": "ID:1, Original Node: red15, IODevice: 0, Active
>>>>>>>>>
>>>>>>>> Node:
>>>>>>> red15",
>>>>>>>>>               "2": "ID:2, Original Node: red14, IODevice: 0, Active
>>>>>>>>>
>>>>>>>> Node:
>>>>>>> red14",
>>>>>>>>>               "3": "ID:3, Original Node: red13, IODevice: 0, Active
>>>>>>>>>
>>>>>>>> Node:
>>>>>>> red13",
>>>>>>>>>               "4": "ID:4, Original Node: red12, IODevice: 0, Active
>>>>>>>>>
>>>>>>>> Node:
>>>>>>> red12",
>>>>>>>>>               "5": "ID:5, Original Node: red11, IODevice: 0, Active
>>>>>>>>>
>>>>>>>> Node:
>>>>>>> red11",
>>>>>>>>>               "6": "ID:6, Original Node: red10, IODevice: 0, Active
>>>>>>>>>
>>>>>>>> Node:
>>>>>>> red10",
>>>>>>>>>               "7": "ID:7, Original Node: red9, IODevice: 0, Active
>>>>>>>>>
>>>>>>>> Node:
>>>>>>> red9",
>>>>>>>>>               "8": "ID:8, Original Node: red8, IODevice: 0, Active
>>>>>>>>>
>>>>>>>> Node:
>>>>>>> red8",
>>>>>>>>>               "9": "ID:9, Original Node: red7, IODevice: 0, Active
>>>>>>>>>
>>>>>>>> Node:
>>>>>>> red7",
>>>>>>>>>               "10": "ID:10, Original Node: red6, IODevice: 0, Active
>>>>>>>>>
>>>>>>>> Node:
>>>>>>>> red6",
>>>>>>>>>               "11": "ID:11, Original Node: red5, IODevice: 0, Active
>>>>>>>>>
>>>>>>>> Node:
>>>>>>>> red5",
>>>>>>>>>               "12": "ID:12, Original Node: red4, IODevice: 0, Active
>>>>>>>>>
>>>>>>>> Node:
>>>>>>>> red4",
>>>>>>>>>               "13": "ID:13, Original Node: red3, IODevice: 0, Active
>>>>>>>>>
>>>>>>>> Node:
>>>>>>>> red3",
>>>>>>>>>               "14": "ID:14, Original Node: red2, IODevice: 0, Active
>>>>>>>>>
>>>>>>>> Node:
>>>>>>>> red2",
>>>>>>>>>               "15": "ID:15, Original Node: red, IODevice: 0, Active
>>>>>>>>>
>>>>>>>> Node:
>>>>>>> red"
>>>>>>>>>           },
>>>>>>>>>           "compiler.framesize": 32768,
>>>>>>>>>           "compiler.groupmemory": 33554432,
>>>>>>>>>           "compiler.joinmemory": 33554432,
>>>>>>>>>           "compiler.pregelix.home": "~/pregelix",
>>>>>>>>>           "compiler.sortmemory": 33554432,
>>>>>>>>>           "core.dump.paths": {
>>>>>>>>>               "red": "/home/clash/asterixStorage/
>>>>>>>>>
>>>>>>>> asterixdb5/red/coredump",
>>>>>>>>               "red10": "/home/clash/asterixStorage/
>>>>>>>>> asterixdb5/red10/coredump",
>>>>>>>>>               "red11": "/home/clash/asterixStorage/
>>>>>>>>> asterixdb5/red11/coredump",
>>>>>>>>>               "red12": "/home/clash/asterixStorage/
>>>>>>>>> asterixdb5/red12/coredump",
>>>>>>>>>               "red13": "/home/clash/asterixStorage/
>>>>>>>>> asterixdb5/red13/coredump",
>>>>>>>>>               "red14": "/home/clash/asterixStorage/
>>>>>>>>> asterixdb5/red14/coredump",
>>>>>>>>>               "red15": "/home/clash/asterixStorage/
>>>>>>>>> asterixdb5/red15/coredump",
>>>>>>>>>               "red16": "/home/clash/asterixStorage/
>>>>>>>>> asterixdb5/red16/coredump",
>>>>>>>>>               "red2": "/home/clash/asterixStorage/
>>>>>>>>>
>>>>>>>> asterixdb5/red2/coredump",
>>>>>>>>
>>>>>>>>>               "red3": "/home/clash/asterixStorage/
>>>>>>>>>
>>>>>>>> asterixdb5/red3/coredump",
>>>>>>>>
>>>>>>>>>               "red4": "/home/clash/asterixStorage/
>>>>>>>>>
>>>>>>>> asterixdb5/red4/coredump",
>>>>>>>>
>>>>>>>>>               "red5": "/home/clash/asterixStorage/
>>>>>>>>>
>>>>>>>> asterixdb5/red5/coredump",
>>>>>>>>
>>>>>>>>>               "red6": "/home/clash/asterixStorage/
>>>>>>>>>
>>>>>>>> asterixdb5/red6/coredump",
>>>>>>>>
>>>>>>>>>               "red7": "/home/clash/asterixStorage/
>>>>>>>>>
>>>>>>>> asterixdb5/red7/coredump",
>>>>>>>>
>>>>>>>>>               "red8": "/home/clash/asterixStorage/
>>>>>>>>>
>>>>>>>> asterixdb5/red8/coredump",
>>>>>>>>
>>>>>>>>>               "red9": "/home/clash/asterixStorage/
>>>>>>>>>
>>>>>>>> asterixdb5/red9/coredump"
>>>>>>>>
>>>>>>>>>           },
>>>>>>>>>           "feed.central.manager.port": 4500,
>>>>>>>>>           "feed.max.threshold.period": 5,
>>>>>>>>>           "feed.memory.available.wait.timeout": 10,
>>>>>>>>>           "feed.memory.global.budget": 67108864,
>>>>>>>>>           "feed.pending.work.threshold": 50,
>>>>>>>>>           "feed.port": 19003,
>>>>>>>>>           "instance.name": "DEFAULT_INSTANCE",
>>>>>>>>>           "log.level": "WARNING",
>>>>>>>>>           "max.wait.active.cluster": 60,
>>>>>>>>>           "metadata.callback.port": 0,
>>>>>>>>>           "metadata.node": "red16",
>>>>>>>>>           "metadata.partition": "ID:0, Original Node: red16,
>>>>>>>>>
>>>>>>>> IODevice:
>>>>>> 0, Active Node: red16",
>>>>>>>>>           "metadata.port": 0,
>>>>>>>>>           "metadata.registration.timeout.secs": 60,
>>>>>>>>>           "nc.java.opts": "-Xmx1024m",
>>>>>>>>>           "node.partitions": {
>>>>>>>>>               "red": ["ID:15, Original Node: red, IODevice: 0, Active
>>>>>>>>>
>>>>>>>> Node:
>>>>>>>> red"],
>>>>>>>>>               "red10": ["ID:6, Original Node: red10, IODevice: 0,
>>>>>>>>>
>>>>>>>> Active
>>>>>>> Node: red10"],
>>>>>>>>>               "red11": ["ID:5, Original Node: red11, IODevice: 0,
>>>>>>>>>
>>>>>>>> Active
>>>>>>> Node: red11"],
>>>>>>>>>               "red12": ["ID:4, Original Node: red12, IODevice: 0,
>>>>>>>>>
>>>>>>>> Active
>>>>>>> Node: red12"],
>>>>>>>>>               "red13": ["ID:3, Original Node: red13, IODevice: 0,
>>>>>>>>>
>>>>>>>> Active
>>>>>>> Node: red13"],
>>>>>>>>>               "red14": ["ID:2, Original Node: red14, IODevice: 0,
>>>>>>>>>
>>>>>>>> Active
>>>>>>> Node: red14"],
>>>>>>>>>               "red15": ["ID:1, Original Node: red15, IODevice: 0,
>>>>>>>>>
>>>>>>>> Active
>>>>>>> Node: red15"],
>>>>>>>>>               "red16": ["ID:0, Original Node: red16, IODevice: 0,
>>>>>>>>>
>>>>>>>> Active
>>>>>>> Node: red16"],
>>>>>>>>>               "red2": ["ID:14, Original Node: red2, IODevice: 0,
>>>>>>>>>
>>>>>>>> Active
>>>>>> Node: red2"],
>>>>>>>>>               "red3": ["ID:13, Original Node: red3, IODevice: 0,
>>>>>>>>>
>>>>>>>> Active
>>>>>> Node: red3"],
>>>>>>>>>               "red4": ["ID:12, Original Node: red4, IODevice: 0,
>>>>>>>>>
>>>>>>>> Active
>>>>>> Node: red4"],
>>>>>>>>>               "red5": ["ID:11, Original Node: red5, IODevice: 0,
>>>>>>>>>
>>>>>>>> Active
>>>>>> Node: red5"],
>>>>>>>>>               "red6": ["ID:10, Original Node: red6, IODevice: 0,
>>>>>>>>>
>>>>>>>> Active
>>>>>> Node: red6"],
>>>>>>>>>               "red7": ["ID:9, Original Node: red7, IODevice: 0,
>>>>>>>>>
>>>>>>>> Active
>>>>>> Node: red7"],
>>>>>>>>>               "red8": ["ID:8, Original Node: red8, IODevice: 0,
>>>>>>>>>
>>>>>>>> Active
>>>>>> Node: red8"],
>>>>>>>>>               "red9": ["ID:7, Original Node: red9, IODevice: 0,
>>>>>>>>>
>>>>>>>> Active
>>>>>> Node: red9"]
>>>>>>>>>           },
>>>>>>>>>           "node.stores": {
>>>>>>>>>               "red": ["/home/clash/asterixStorage/
>>>>>>>>>
>>>>>>>> asterixdb5/red/storage"],
>>>>>>>>
>>>>>>>>>               "red10": ["/home/clash/asterixStorage/
>>>>>>>>> asterixdb5/red10/storage"],
>>>>>>>>>               "red11": ["/home/clash/asterixStorage/
>>>>>>>>> asterixdb5/red11/storage"],
>>>>>>>>>               "red12": ["/home/clash/asterixStorage/
>>>>>>>>> asterixdb5/red12/storage"],
>>>>>>>>>               "red13": ["/home/clash/asterixStorage/
>>>>>>>>> asterixdb5/red13/storage"],
>>>>>>>>>               "red14": ["/home/clash/asterixStorage/
>>>>>>>>> asterixdb5/red14/storage"],
>>>>>>>>>               "red15": ["/home/clash/asterixStorage/
>>>>>>>>> asterixdb5/red15/storage"],
>>>>>>>>>               "red16": ["/home/clash/asterixStorage/
>>>>>>>>> asterixdb5/red16/storage"],
>>>>>>>>>               "red2": ["/home/clash/asterixStorage/
>>>>>>>>> asterixdb5/red2/storage"],
>>>>>>>>>               "red3": ["/home/clash/asterixStorage/
>>>>>>>>> asterixdb5/red3/storage"],
>>>>>>>>>               "red4": ["/home/clash/asterixStorage/
>>>>>>>>> asterixdb5/red4/storage"],
>>>>>>>>>               "red5": ["/home/clash/asterixStorage/
>>>>>>>>> asterixdb5/red5/storage"],
>>>>>>>>>               "red6": ["/home/clash/asterixStorage/
>>>>>>>>> asterixdb5/red6/storage"],
>>>>>>>>>               "red7": ["/home/clash/asterixStorage/
>>>>>>>>> asterixdb5/red7/storage"],
>>>>>>>>>               "red8": ["/home/clash/asterixStorage/
>>>>>>>>> asterixdb5/red8/storage"],
>>>>>>>>>               "red9": ["/home/clash/asterixStorage/
>>>>>>>>>
>>>>>>>> asterixdb5/red9/storage"]
>>>>>>>>
>>>>>>>>>           },
>>>>>>>>>           "plot.activate": false,
>>>>>>>>>           "replication.enabled": false,
>>>>>>>>>           "replication.factor": 2,
>>>>>>>>>           "replication.log.batchsize": 4096,
>>>>>>>>>           "replication.log.buffer.numpages": 8,
>>>>>>>>>           "replication.log.buffer.pagesize": 131072,
>>>>>>>>>           "replication.max.remote.recovery.attempts": 5,
>>>>>>>>>           "replication.timeout": 30,
>>>>>>>>>           "storage.buffercache.maxopenfiles": 2147483647,
>>>>>>>>>           "storage.buffercache.pagesize": 131072,
>>>>>>>>>           "storage.buffercache.size": 536870912,
>>>>>>>>>           "storage.lsm.bloomfilter.falsepositiverate": 0.01,
>>>>>>>>>           "storage.memorycomponent.globalbudget": 536870912,
>>>>>>>>>           "storage.memorycomponent.numcomponents": 2,
>>>>>>>>>           "storage.memorycomponent.numpages": 256,
>>>>>>>>>           "storage.memorycomponent.pagesize": 131072,
>>>>>>>>>           "storage.metadata.memorycomponent.numpages": 256,
>>>>>>>>>           "transaction.log.dirs": {
>>>>>>>>>               "red": "/home/clash/asterixStorage/
>>>>>>>>>
>>>>>>>> asterixdb5/red/txnlog",
>>>>>>>               "red10": "/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red10/txnlog",
>>>>>>>>
>>>>>>>>>               "red11": "/home/clash/asterixStorage/
>>>>>>>>>
>>>>>>>> asterixdb5/red11/txnlog",
>>>>>>>>
>>>>>>>>>               "red12": "/home/clash/asterixStorage/
>>>>>>>>>
>>>>>>>> asterixdb5/red12/txnlog",
>>>>>>>>
>>>>>>>>>               "red13": "/home/clash/asterixStorage/
>>>>>>>>>
>>>>>>>> asterixdb5/red13/txnlog",
>>>>>>>>
>>>>>>>>>               "red14": "/home/clash/asterixStorage/
>>>>>>>>>
>>>>>>>> asterixdb5/red14/txnlog",
>>>>>>>>
>>>>>>>>>               "red15": "/home/clash/asterixStorage/
>>>>>>>>>
>>>>>>>> asterixdb5/red15/txnlog",
>>>>>>>>
>>>>>>>>>               "red16": "/home/clash/asterixStorage/
>>>>>>>>>
>>>>>>>> asterixdb5/red16/txnlog",
>>>>>>>>
>>>>>>>>>               "red2": "/home/clash/asterixStorage/
>>>>>>>>>
>>>>>>>> asterixdb5/red2/txnlog",
>>>>>>>>               "red3": "/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red3/txnlog",
>>>>>>>>               "red4": "/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red4/txnlog",
>>>>>>>>               "red5": "/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red5/txnlog",
>>>>>>>>               "red6": "/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red6/txnlog",
>>>>>>>>               "red7": "/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red7/txnlog",
>>>>>>>>               "red8": "/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red8/txnlog",
>>>>>>>>               "red9": "/home/clash/asterixStorage/
>>>>>>>> asterixdb5/red9/txnlog"
>>>>>>>>           },
>>>>>>>>>           "txn.commitprofiler.reportinterval": 5,
>>>>>>>>>           "txn.job.recovery.memorysize": 67108864,
>>>>>>>>>           "txn.lock.escalationthreshold": 1000,
>>>>>>>>>           "txn.lock.shrinktimer": 5000,
>>>>>>>>>           "txn.lock.timeout.sweepthreshold": 10000,
>>>>>>>>>           "txn.lock.timeout.waitthreshold": 60000,
>>>>>>>>>           "txn.log.buffer.numpages": 8,
>>>>>>>>>           "txn.log.buffer.pagesize": 131072,
>>>>>>>>>           "txn.log.checkpoint.history": 0,
>>>>>>>>>           "txn.log.checkpoint.lsnthreshold": 67108864,
>>>>>>>>>           "txn.log.checkpoint.pollfrequency": 120,
>>>>>>>>>           "txn.log.partitionsize": 268435456,
>>>>>>>>>           "web.port": 19001,
>>>>>>>>>           "web.queryinterface.port": 19006,
>>>>>>>>>           "web.secondary.port": 19005
>>>>>>>>>       },
>>>>>>>>>       "fullShutdownUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/shutdown?all=true",
>>>>>>>>>       "metadata_node": "red16",
>>>>>>>>>       "ncs": [
>>>>>>>>>           {
>>>>>>>>>               "configUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/config",
>>>>>>>>>               "node_id": "red15",
>>>>>>>>>               "partitions": [{
>>>>>>>>>                   "active": true,
>>>>>>>>>                   "partition_id": "partition_1"
>>>>>>>>>               }],
>>>>>>>>>               "state": "ACTIVE",
>>>>>>>>>               "statsUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red15/stats",
>>>>>>>>>               "threadDumpUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/
>>>>>>>>>
>>>>>>>> red15/threaddump
>>>>>> "
>>>>>>
>>>>>>>           },
>>>>>>>>>           {
>>>>>>>>>               "configUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/config",
>>>>>>>>>               "node_id": "red14",
>>>>>>>>>               "partitions": [{
>>>>>>>>>                   "active": true,
>>>>>>>>>                   "partition_id": "partition_2"
>>>>>>>>>               }],
>>>>>>>>>               "state": "ACTIVE",
>>>>>>>>>               "statsUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red14/stats",
>>>>>>>>>               "threadDumpUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/
>>>>>>>>>
>>>>>>>> red14/threaddump
>>>>>> "
>>>>>>
>>>>>>>           },
>>>>>>>>>           {
>>>>>>>>>               "configUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/config",
>>>>>>>>>               "node_id": "red16",
>>>>>>>>>               "partitions": [{
>>>>>>>>>                   "active": true,
>>>>>>>>>                   "partition_id": "partition_0"
>>>>>>>>>               }],
>>>>>>>>>               "state": "ACTIVE",
>>>>>>>>>               "statsUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red16/stats",
>>>>>>>>>               "threadDumpUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/
>>>>>>>>>
>>>>>>>> red16/threaddump
>>>>>> "
>>>>>>
>>>>>>>           },
>>>>>>>>>           {
>>>>>>>>>               "configUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/config",
>>>>>>>>>               "node_id": "red11",
>>>>>>>>>               "partitions": [{
>>>>>>>>>                   "active": true,
>>>>>>>>>                   "partition_id": "partition_5"
>>>>>>>>>               }],
>>>>>>>>>               "state": "ACTIVE",
>>>>>>>>>               "statsUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red11/stats",
>>>>>>>>>               "threadDumpUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/
>>>>>>>>>
>>>>>>>> red11/threaddump
>>>>>> "
>>>>>>
>>>>>>>           },
>>>>>>>>>           {
>>>>>>>>>               "configUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/config",
>>>>>>>>>               "node_id": "red10",
>>>>>>>>>               "partitions": [{
>>>>>>>>>                   "active": true,
>>>>>>>>>                   "partition_id": "partition_6"
>>>>>>>>>               }],
>>>>>>>>>               "state": "ACTIVE",
>>>>>>>>>               "statsUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red10/stats",
>>>>>>>>>               "threadDumpUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/
>>>>>>>>>
>>>>>>>> red10/threaddump
>>>>>> "
>>>>>>
>>>>>>>           },
>>>>>>>>>           {
>>>>>>>>>               "configUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/config",
>>>>>>>>>               "node_id": "red13",
>>>>>>>>>               "partitions": [{
>>>>>>>>>                   "active": true,
>>>>>>>>>                   "partition_id": "partition_3"
>>>>>>>>>               }],
>>>>>>>>>               "state": "ACTIVE",
>>>>>>>>>               "statsUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red13/stats",
>>>>>>>>>               "threadDumpUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/
>>>>>>>>>
>>>>>>>> red13/threaddump
>>>>>> "
>>>>>>
>>>>>>>           },
>>>>>>>>>           {
>>>>>>>>>               "configUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/config",
>>>>>>>>>               "node_id": "red12",
>>>>>>>>>               "partitions": [{
>>>>>>>>>                   "active": true,
>>>>>>>>>                   "partition_id": "partition_4"
>>>>>>>>>               }],
>>>>>>>>>               "state": "ACTIVE",
>>>>>>>>>               "statsUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red12/stats",
>>>>>>>>>               "threadDumpUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/
>>>>>>>>>
>>>>>>>> red12/threaddump
>>>>>> "
>>>>>>
>>>>>>>           },
>>>>>>>>>           {
>>>>>>>>>               "configUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/config",
>>>>>>>>>               "node_id": "red6",
>>>>>>>>>               "partitions": [{
>>>>>>>>>                   "active": true,
>>>>>>>>>                   "partition_id": "partition_10"
>>>>>>>>>               }],
>>>>>>>>>               "state": "ACTIVE",
>>>>>>>>>               "statsUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/stats",
>>>>>>>>>               "threadDumpUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red6/
>>>>>>>>>
>>>>>>>> threaddump"
>>>>>>           },
>>>>>>>>>           {
>>>>>>>>>               "configUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/config",
>>>>>>>>>               "node_id": "red",
>>>>>>>>>               "partitions": [{
>>>>>>>>>                   "active": true,
>>>>>>>>>                   "partition_id": "partition_15"
>>>>>>>>>               }],
>>>>>>>>>               "state": "ACTIVE",
>>>>>>>>>               "statsUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/stats",
>>>>>>>>>               "threadDumpUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red/threaddump
>>>>>>>>>
>>>>>>>> "
>>>>>>           },
>>>>>>>>>           {
>>>>>>>>>               "configUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/config",
>>>>>>>>>               "node_id": "red5",
>>>>>>>>>               "partitions": [{
>>>>>>>>>                   "active": true,
>>>>>>>>>                   "partition_id": "partition_11"
>>>>>>>>>               }],
>>>>>>>>>               "state": "ACTIVE",
>>>>>>>>>               "statsUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/stats",
>>>>>>>>>               "threadDumpUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red5/
>>>>>>>>>
>>>>>>>> threaddump"
>>>>>>           },
>>>>>>>>>           {
>>>>>>>>>               "configUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/config",
>>>>>>>>>               "node_id": "red8",
>>>>>>>>>               "partitions": [{
>>>>>>>>>                   "active": true,
>>>>>>>>>                   "partition_id": "partition_8"
>>>>>>>>>               }],
>>>>>>>>>               "state": "ACTIVE",
>>>>>>>>>               "statsUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/stats",
>>>>>>>>>               "threadDumpUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red8/
>>>>>>>>>
>>>>>>>> threaddump"
>>>>>>           },
>>>>>>>>>           {
>>>>>>>>>               "configUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/config",
>>>>>>>>>               "node_id": "red7",
>>>>>>>>>               "partitions": [{
>>>>>>>>>                   "active": true,
>>>>>>>>>                   "partition_id": "partition_9"
>>>>>>>>>               }],
>>>>>>>>>               "state": "ACTIVE",
>>>>>>>>>               "statsUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/stats",
>>>>>>>>>               "threadDumpUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red7/
>>>>>>>>>
>>>>>>>> threaddump"
>>>>>>           },
>>>>>>>>>           {
>>>>>>>>>               "configUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/config",
>>>>>>>>>               "node_id": "red2",
>>>>>>>>>               "partitions": [{
>>>>>>>>>                   "active": true,
>>>>>>>>>                   "partition_id": "partition_14"
>>>>>>>>>               }],
>>>>>>>>>               "state": "ACTIVE",
>>>>>>>>>               "statsUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/stats",
>>>>>>>>>               "threadDumpUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red2/
>>>>>>>>>
>>>>>>>> threaddump"
>>>>>>           },
>>>>>>>>>           {
>>>>>>>>>               "configUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/config",
>>>>>>>>>               "node_id": "red4",
>>>>>>>>>               "partitions": [{
>>>>>>>>>                   "active": true,
>>>>>>>>>                   "partition_id": "partition_12"
>>>>>>>>>               }],
>>>>>>>>>               "state": "ACTIVE",
>>>>>>>>>               "statsUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/stats",
>>>>>>>>>               "threadDumpUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red4/
>>>>>>>>>
>>>>>>>> threaddump"
>>>>>>           },
>>>>>>>>>           {
>>>>>>>>>               "configUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/config",
>>>>>>>>>               "node_id": "red3",
>>>>>>>>>               "partitions": [{
>>>>>>>>>                   "active": true,
>>>>>>>>>                   "partition_id": "partition_13"
>>>>>>>>>               }],
>>>>>>>>>               "state": "ACTIVE",
>>>>>>>>>               "statsUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/stats",
>>>>>>>>>               "threadDumpUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red3/
>>>>>>>>>
>>>>>>>> threaddump"
>>>>>>           },
>>>>>>>>>           {
>>>>>>>>>               "configUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/config",
>>>>>>>>>               "node_id": "red9",
>>>>>>>>>               "partitions": [{
>>>>>>>>>                   "active": true,
>>>>>>>>>                   "partition_id": "partition_7"
>>>>>>>>>               }],
>>>>>>>>>               "state": "ACTIVE",
>>>>>>>>>               "statsUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/stats",
>>>>>>>>>               "threadDumpUri":
>>>>>>>>> "http://scai01.cs.ucla.edu:19002/admin/cluster/node/red9/
>>>>>>>>>
>>>>>>>> threaddump"
>>>>>>           }
>>>>>>>>>       ],
>>>>>>>>>       "shutdownUri": "http://scai01.cs.ucla.edu:19002/admin/shutdown
>>>>>>>>>
>>>>>>>> ",
>>>>>>       "state": "ACTIVE",
>>>>>>>>>       "versionUri": "http://scai01.cs.ucla.edu:19002/admin/version"
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> 2.Catalog_return:2.28G
>>>>>>>>>
>>>>>>>>> catalog_sales:31.01G
>>>>>>>>>
>>>>>>>>> inventory:8.63G
>>>>>>>>>
>>>>>>>>> 3.As for Pig and Hive, I always use the default configuration. I
>>>>>>>>>
>>>>>>>> didn't
>>>>>>> set
>>>>>>>>> the partition things for them. And for Spark, we use 200
>>>>>>>>>
>>>>>>>> partitions,
>>>>>> which
>>>>>>>>> may be improved and just not bad. For AsterixDB, I also set the
>>>>>>>>>
>>>>>>>> cluster
>>>>>>> using default value of partition and JVM things (I didn't manually
>>>>>>>> set
>>>>>>> these parameters).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Dec 20, 2016 at 5:58 PM, Yingyi Bu <buyingyi@gmail.com>
>>>>>>>>>
>>>>>>>> wrote:
>>>>>>> Mingda,
>>>>>>>>>>        1. Can you paste the returned JSON of http://<master
>>>>>>>>>> node>:19002/admin/cluster at your side? (Pls replace <master
>>>>>>>>>>
>>>>>>>>> node>
>>>>>> with
>>>>>>>> the
>>>>>>>>>> actual master node name or IP)
>>>>>>>>>>        2. Can you list the individual size of each dataset involved
>>>>>>>>>>
>>>>>>>>> in
>>>>>>> the
>>>>>>>>> query, e.g., catalog_returns, catalog_sales, and inventory?  (I
>>>>>>>>> assume
>>>>>>>> 100GB is the overall size?)
>>>>>>>>>>        3. Do Spark/Hive/Pig saturate all CPUs on all machines,
>>>>>>>>>>
>>>>>>>>> i.e.,
>>>>>> how
>>>>>>>> many
>>>>>>>>>> partitions are running on each machine?  (It seems that your
>>>>>>>>>>
>>>>>>>>> AsterixDB
>>>>>>>> configuration wouldn't saturate all CPUs for queries --- in the
>>>>>>>>> current
>>>>>>>> AsterixDB master, the computation parallelism is set to be the
>>>>>>>>> same
>>>>>> as
>>>>>>>> the
>>>>>>>>>> storage parallelism (i.e., the number of iodevices on each NC).
>>>>>>>>>>
>>>>>>>>> I've
>>>>>>> submitted a new patch that allow flexible computation
>>>>>>>>> parallelism,
>>>>>> which
>>>>>>>>> should be able to get merged into master very soon.)
>>>>>>>>>>        Thanks!
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Yingyi
>>>>>>>>>>
>>>>>>>>>> On Tue, Dec 20, 2016 at 5:44 PM, mingda li <
>>>>>>>>>>
>>>>>>>>> limingda1993@gmail.com
>>>>>> wrote:
>>>>>>>>>> Oh, sure. When we test the 100G multiple join, we find
>>>>>>>>>> AsterixDB
>>>>>> is
>>>>>>
>>>>>>> slower
>>>>>>>>>>> than Spark (but still faster than Pig and Hive).
>>>>>>>>>>> I can share with you the both plots: 1-10G.eps and 1-100G.eps.
>>>>>>>>>>>
>>>>>>>>>> (We
>>>>>>> will
>>>>>>>>> only use 1-10G.eps in our paper).
>>>>>>>>>>> And thanks for Ian's advice:* The dev list generally strips
>>>>>>>>>>>
>>>>>>>>>> attachments.
>>>>>>>>>> Maybe you can just put the config inline? Or link to a
>>>>>>>>>> pastebin/gist?*
>>>>>>>>> I know why you can't see the attachments. So I move the plots
>>>>>>>>>> with
>>>>>>> two
>>>>>>>>> documents to my Dropbox.
>>>>>>>>>>> You can find the
>>>>>>>>>>> 1-10G.eps here: https://www.dropbox.com/s/
>>>>>>>>>>>
>>>>>>>>>> rk3xg6gigsfcuyq/1-10G.eps?dl=0
>>>>>>>>>> 1-100G.eps here:https://www.dropbox.com/
>>>>>>>>>> s/tyxnmt6ehau2ski/1-100G.eps
>>>>>>>> ?
>>>>>>>>
>>>>>>>>> dl=0
>>>>>>>>>>> cc_conf.pdf here: https://www.dropbox.com/s/
>>>>>>>>>>>
>>>>>>>>>> y3of1s17qdstv5f/cc_conf.pdf?
>>>>>>>>>> dl=0
>>>>>>>>>>> CompleteQuery.pdf here:
>>>>>>>>>>> https://www.dropbox.com/s/lml3fzxfjcmf2c1/CompleteQuery.
>>>>>>>>>>>
>>>>>>>>>> pdf?dl=0
>>>>>> On Tue, Dec 20, 2016 at 4:40 PM, Tyson Condie <
>>>>>>>>>> tcondie.ucla@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>> Mingda: Please also share the numbers for 100GB, which show
>>>>>>>>>>> AsterixDB
>>>>>>>>> not
>>>>>>>>>>> quite doing as well as Spark. These 100GB results will not be
>>>>>>>>>>> in
>>>>>>> our
>>>>>>>>> submission version, since they’re not needed for the desired
>>>>>>>>>>> message:
>>>>>>>>> picking the right join order matters. Nevertheless, I’d like
>>>>>>>>>>> to
>>>>>> get a
>>>>>>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message