asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wenhai (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (ASTERIXDB-1777) Budget does not consider the runfile frame that should be temporarily cached in massive memory.
Date Sat, 28 Jan 2017 09:31:24 GMT

     [ https://issues.apache.org/jira/browse/ASTERIXDB-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wenhai updated ASTERIXDB-1777:
------------------------------
    Description: 
Till now, we ensued that two cases should consider caching the frames in the memory (to shoe
up the pipeline scheduling) before we write (syncwrite) them onto the Runfile:
1. Replicate: In parallel sort case, if we have massive memory, we should cache the framework
in the memory (rather than directly writing them onto the Runfile) before forward them onto
distributed range partitions.
2. ExternalSort: The current Sorter caches the frames by the constraint of compiler.sortmemory
in asterix-configuration.xml. In other words, we sort such batch size of frames in one-shot.
Actually, we can run faster if we configure smaller sortmemory budget (in our memory-resident
experiment, 64MB saves 20% sort time as compared to that in 320MB), but the per-round sorted
frames will be write onto Runfile with 1:1 of the total data size. We can also consider this
case similar to the above Replicate case.
Still we are thinking the general cases like the above ...

  was:
Till now, we ensued that two cases should consider caching the frames in the memory (to shoe
up the pipeline scheduling) before we write (syncwrite) them onto the Runfile:
1. Replicate: In parallel sort case, if we have massive memory, we should cache the framework
in the memory before forward them onto distributed range partitions.
2. ExternalSort: The current Sorter caches the frames by the constraint of compiler.sortmemory
in asterix-configuration.xml. In other words, we sort such batch size of frames in one-shot.
Actually, we can run faster if we configure smaller sortmemory budget (in our memory-resident
experiment, 64MB saves 20% sort time as compared to that in 320MB), but the per-round sorted
frames will be write onto Runfile with 1:1 of the total data size. We can also consider this
case similar to the above Replicate case.
Still we are thinking the general cases like the above ...


> Budget does not consider the runfile frame that should be temporarily cached in massive
memory.
> -----------------------------------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1777
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1777
>             Project: Apache AsterixDB
>          Issue Type: Improvement
>         Environment: MAC/Linux
>            Reporter: Wenhai
>            Assignee: Wenhai
>
> Till now, we ensued that two cases should consider caching the frames in the memory (to
shoe up the pipeline scheduling) before we write (syncwrite) them onto the Runfile:
> 1. Replicate: In parallel sort case, if we have massive memory, we should cache the framework
in the memory (rather than directly writing them onto the Runfile) before forward them onto
distributed range partitions.
> 2. ExternalSort: The current Sorter caches the frames by the constraint of compiler.sortmemory
in asterix-configuration.xml. In other words, we sort such batch size of frames in one-shot.
Actually, we can run faster if we configure smaller sortmemory budget (in our memory-resident
experiment, 64MB saves 20% sort time as compared to that in 320MB), but the per-round sorted
frames will be write onto Runfile with 1:1 of the total data size. We can also consider this
case similar to the above Replicate case.
> Still we are thinking the general cases like the above ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message