asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yingyi Bu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ASTERIXDB-1433) Multiple cores with huge memory slow down in the big fact table aggregation.
Date Wed, 11 May 2016 18:03:12 GMT

    [ https://issues.apache.org/jira/browse/ASTERIXDB-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280525#comment-15280525
] 

Yingyi Bu commented on ASTERIXDB-1433:
--------------------------------------

[~lwhay]

>> However, the running trace results demonstrate that, as compared to the big memory
configurations, 
Is it possible to paste the query and dataset details (e.g, schema, size) here?

>> the original tables is always re-loaded from the disk to the actual memory even they
have been handled in the latest query. 
We do have read-only disk buffer cache.  Do you have more concrete numbers, e.g., dataset
size, number of read/write I/Os (from /proc/...), response time, etc.?

Best,
Yingyi

> Multiple cores with huge memory slow down in the big fact table aggregation.
> ----------------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1433
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1433
>             Project: Apache AsterixDB
>          Issue Type: Improvement
>          Components: Hyracks Core
>         Environment: 10 nodes X Linux ubuntu/6 cpu X 4 cores/per cpu, 128 GB memory/per
node.
>            Reporter: Wenhai
>
> This is a classic hardware platform that shoes up the TB scale of dataset in total. AsterixDB
does extremely well for the complex query that includes multiple join operators over a high-selectivity
select operator. However, the running trace results demonstrate that, as compared to the big
memory configurations, the original tables is always re-loaded from the disk to the actual
memory even they have been handled in the latest query. To this end, why not provide the strategy
to keep the intermediate data of the last completed query into the memory and free them in
case the memory is not  enough for the newly query. In some case, the user will always trigger
the query with the different parameters on the same tables, for example, the variant-parameter
aggregation on the single big fact table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message