asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wenhai (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (ASTERIXDB-1777) Budget does not consider the runfile frame that should be temporarily cached in massive memory.
Date Sat, 28 Jan 2017 09:31:24 GMT

     [ https://issues.apache.org/jira/browse/ASTERIXDB-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wenhai updated ASTERIXDB-1777:
------------------------------
    Description: 
Till now, we ensued that two cases should consider caching the frames in the memory (to shoe
up the pipeline scheduling) before we write (syncwrite) them onto the Runfile:
1. Replicate: In parallel sort case, if we have massive memory, we should cache the framework
in the memory before forward them onto distributed range partitions.
2. ExternalSort: The current Sorter caches the frames by the constraint of compiler.sortmemory
in asterix-configuration.xml. In other words, we sort such batch size of frames in one-shot.
Actually, we can run faster if we configure smaller sortmemory budget (in our memory-resident
experiment, 64MB saves 20% sort time as compared to that in 320MB), but the per-round sorted
frames will be write onto Runfile with 1:1 of the total data size. We can also consider this
case similar to the above Replicate case.
Still we are thinking the general cases like the above ...

  was:
Till now, we ensued that two cases should consider cache the frame in the memory before we
write (syncwrite) them onto the Runfile:
1. Replicate: In parallel sort case, if we have massive memory, we should cache the framework
in the memory before forward them onto distributed range partitions.
2. ExternalSort: The current Sorter caches the frames by the constraint of compiler.sortmemory
in asterix-configuration.xml. In other words, we sort such batch size of frames in one-shot.
Actually, we can run faster if we configure smaller sortmemory budget (in our memory-resident
experiment, 64MB saves 20% sort time as compared to that in 320MB), but the per-round sorted
frames will be write onto Runfile with 1:1 of the total data size. We can also consider this
case similar to the above Replicate case.
Still we are thinking the general cases like the above ...


> Budget does not consider the runfile frame that should be temporarily cached in massive
memory.
> -----------------------------------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1777
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1777
>             Project: Apache AsterixDB
>          Issue Type: Improvement
>         Environment: MAC/Linux
>            Reporter: Wenhai
>            Assignee: Wenhai
>
> Till now, we ensued that two cases should consider caching the frames in the memory (to
shoe up the pipeline scheduling) before we write (syncwrite) them onto the Runfile:
> 1. Replicate: In parallel sort case, if we have massive memory, we should cache the framework
in the memory before forward them onto distributed range partitions.
> 2. ExternalSort: The current Sorter caches the frames by the constraint of compiler.sortmemory
in asterix-configuration.xml. In other words, we sort such batch size of frames in one-shot.
Actually, we can run faster if we configure smaller sortmemory budget (in our memory-resident
experiment, 64MB saves 20% sort time as compared to that in 320MB), but the per-round sorted
frames will be write onto Runfile with 1:1 of the total data size. We can also consider this
case similar to the above Replicate case.
> Still we are thinking the general cases like the above ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message