asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yingyi Bu <buyin...@gmail.com>
Subject Re: Should in-memory components from different dataset share the entire memory?
Date Thu, 10 Mar 2016 22:55:28 GMT
>> A more fundament question Is it possible that all those datasets share a
global budget in a multi-tenant way?
In principle, the budget should just be a upper-bound. If a dataset doesn't
need that much, it shouldn't pre-allocate all
"storage.memorycomponent.numpages"
pages.

However, in the current implementation, we pre-allocate all in-memory pages
upfront:
https://github.com/apache/incubator-asterixdb-hyracks/blob/master/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/VirtualBufferCache.java#L247

I think we should fix it to dynamically allocate memory when needed.  (Disk
buffer cache already does that.)

Best,
Yingyi


On Thu, Mar 10, 2016 at 2:46 PM, Jianfeng Jia <jianfeng.jia@gmail.com>
wrote:

> Dear Devs,
>
> I have some questions about the memory management of the in-memory
> components for different datasets.
>
> The current AsterixDB backing the cloudberry demo is down every few days.
> It always throws an exception like following:
> Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: Failed
> to open index with resource ID 7 since it does not exist.
>
> As described in ASTERIXDB-1337, each dataset has a fixed budget no matter
> how small/big it is. Then the number of datasets can be loaded at the same
> time is also fixed by $number =
> storage.memorycomponent.globalbudget/storage.memorycomponent.numpages. My
> question is if we have more than $number of datasets, then the eviction
> will happen? Will it evict a entire dataset of the victim? Base on the
> symptom of above exception, it seems the metadata get evicted? Could we
> protect the metadata from eviction?
>
> A more fundament question Is it possible that all those datasets share a
> global budget in a multi-tenant way?
> In my workload there are one main dataset( ~10Gb) and five tiny auxiliary
> datasets (each size <20M). In addition, the client will create a bunch of
> temporary datasets depends on how many concurrent users are and each
> temp-dataset will be “refreshed" for a new query. (The refresh is done by
> drop and create the temp-dataset). It’s hard to find one
> storage.memorycomponent.numpages that make every dataset happy.
>
>
>
> Best,
>
> Jianfeng Jia
> PhD Candidate of Computer Science
> University of California, Irvine
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message