asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Westmann" <ti...@apache.org>
Subject Re: PrimaryIndexOperationTracker
Date Tue, 24 Oct 2017 04:24:03 GMT
Hi,

I think that it’s actually quite common to have multiple iodevices.
Even if there is only one disk, configuring multiple iodevices can
improve performance - especially if a significant part of the data fits
into memory (and disk IO is not the limiting factor). And the reason
for that seems to be that the degree of parallelism for the "lower"
part of a query plan is limited by the number of available iodevices.
Also, I think that multiple disks are not uncommon for heavier data
management workloads. The d2 instances [1] on AWS give some example
configurations.

All this is just to say that we should not discount
a) the need to support multiple disks and
b) the limitations to linear scalability introduced by "1-per-NC" (vs.
    "1-per-partition") objects.
On machines with a few more cores (10+) we've used multiple NCs (as
opposed to multiple partitions per NC) to achieve better scale-out and
better query performance.

Cheers,
Till

[1] 
https://aws.amazon.com/about-aws/whats-new/2015/03/now-available-d2-instances-the-latest-generation-of-amazon-ec2-dense-storage-instances/

On 23 Oct 2017, at 20:18, Chen Luo wrote:

> I was confusing about this problem last quarter, and I remembered 
> (maybe
> Ian) told me the same reason that it's mainly used for memory 
> budgeting
> (primary and secondary indexes of dataset in the same node). Moreover, 
> it's
> not very common to have too many I/O devices on a node, unless it has
> multiple disks.
>
> Best regards,
> Chen Luo
>
> On Mon, Oct 23, 2017 at 8:12 PM, Sattam Alsubaiee 
> <salsubaiee@gmail.com>
> wrote:
>
>> There is a fundamental reason behind that (unless that has changed). 
>> All
>> partitions of a dataset (primary and secondary indexes) share the 
>> memory
>> budget for their in-memory components. So they use the same optracker 
>> to
>> synchronize their operations.
>>
>>
>> Sattam
>>
>> On Oct 24, 2017 5:59 AM, "Wail Alkowaileet" <wael.y.k@gmail.com> 
>> wrote:
>>
>>> Thanks Abdullah!
>>>
>>> On Mon, Oct 23, 2017 at 7:15 PM, abdullah alamoudi 
>>> <bamousaa@gmail.com>
>>> wrote:
>>>
>>>> Hi Wail,
>>>> There is no fundamental reason why it is one. In fact, it has been 
>>>> on
>> our
>>>> todo for a long time to make it one per partition.
>>>>
>>>> Cheers,
>>>> Abdullah.
>>>>
>>>>> On Oct 23, 2017, at 7:14 PM, Wail Alkowaileet <wael.y.k@gmail.com>
>>>> wrote:
>>>>>
>>>>> Dear devs,
>>>>>
>>>>> I have a question regarding the opTracker. Currently, we 
>>>>> initialize
>> one
>>>>> opTracker per dataset in every NC.
>>>>>
>>>>> My question is why it's per dataset not per partition ? Is there a
>>>>> transactional constraints for that ?
>>>>>
>>>>> From what I can see that the opTracker can create a lot of 
>>>>> contention
>>> in
>>>>> case there're many IO devices. For instance, each insert will call
>>>>> *LSMHarness.getAndEnterComponents()* [1], which
>>>> *synchronize(opTracker). *That
>>>>> means (correct me if I'm wrong), insert is going to serialize the
>>>>> *enterComponent()* part among partitions.
>>>>>
>>>>> [1]
>>>>> https://github.com/apache/asterixdb/blob/master/hyracks-
>>>> fullstack/hyracks/hyracks-storage-am-lsm-common/src/
>>>> main/java/org/apache/hyracks/storage/am/lsm/common/impls/
>>>> LSMHarness.java#L86
>>>>>
>>>>> --
>>>>>
>>>>> *Regards,*
>>>>> Wail Alkowaileet
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> *Regards,*
>>> Wail Alkowaileet
>>>
>>

Mime
View raw message