asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: Should in-memory components from different dataset share the entire memory?
Date Fri, 11 Mar 2016 01:34:01 GMT
There is still the broader question - as the overall memory of a node 
(NC) has to be shared among the three kinds of memory (at least) that I 
also mentioned just a minute ago:
   - buffer cache (global/shared memory for caching pages of disk 
components)
   - in-memory components (i.e., the current thread's topic)
   - working memory (needed for hash-based and sort-based operators like 
joins/aggs)
Whatever we do next should probably start keeping in mind this 
bigger-picture/zero-sum game.


On 3/10/16 3:23 PM, Jianfeng Jia wrote:
> Another way is to allocate the entire global space upfront, but there is no upper-bound
for each dataset. The bigger dataset gets more pages. The drawback is that we may waste more
 space if all dataset are all small.
>
> The ideal case is there is only one dynamically memory allocation manager with one global
upper-bound, and all datasets can share the space without an extra bound per dataset.
>
>
>> On Mar 10, 2016, at 2:55 PM, Yingyi Bu <buyingyi@gmail.com> wrote:
>>
>>>> A more fundament question Is it possible that all those datasets share a
>> global budget in a multi-tenant way?
>> In principle, the budget should just be a upper-bound. If a dataset doesn't
>> need that much, it shouldn't pre-allocate all
>> "storage.memorycomponent.numpages"
>> pages.
>>
>> However, in the current implementation, we pre-allocate all in-memory pages
>> upfront:
>> https://github.com/apache/incubator-asterixdb-hyracks/blob/master/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/VirtualBufferCache.java#L247
>>
>> I think we should fix it to dynamically allocate memory when needed.  (Disk
>> buffer cache already does that.)
>>
>> Best,
>> Yingyi
>>
>>
>> On Thu, Mar 10, 2016 at 2:46 PM, Jianfeng Jia <jianfeng.jia@gmail.com>
>> wrote:
>>
>>> Dear Devs,
>>>
>>> I have some questions about the memory management of the in-memory
>>> components for different datasets.
>>>
>>> The current AsterixDB backing the cloudberry demo is down every few days.
>>> It always throws an exception like following:
>>> Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: Failed
>>> to open index with resource ID 7 since it does not exist.
>>>
>>> As described in ASTERIXDB-1337, each dataset has a fixed budget no matter
>>> how small/big it is. Then the number of datasets can be loaded at the same
>>> time is also fixed by $number =
>>> storage.memorycomponent.globalbudget/storage.memorycomponent.numpages. My
>>> question is if we have more than $number of datasets, then the eviction
>>> will happen? Will it evict a entire dataset of the victim? Base on the
>>> symptom of above exception, it seems the metadata get evicted? Could we
>>> protect the metadata from eviction?
>>>
>>> A more fundament question Is it possible that all those datasets share a
>>> global budget in a multi-tenant way?
>>> In my workload there are one main dataset( ~10Gb) and five tiny auxiliary
>>> datasets (each size <20M). In addition, the client will create a bunch of
>>> temporary datasets depends on how many concurrent users are and each
>>> temp-dataset will be “refreshed" for a new query. (The refresh is done by
>>> drop and create the temp-dataset). It’s hard to find one
>>> storage.memorycomponent.numpages that make every dataset happy.
>>>
>>>
>>>
>>> Best,
>>>
>>> Jianfeng Jia
>>> PhD Candidate of Computer Science
>>> University of California, Irvine
>>>
>>>
>
>
> Best,
>
> Jianfeng Jia
> PhD Candidate of Computer Science
> University of California, Irvine
>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message