apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "bright chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (APEXMALHAR-2190) Use shared memory to serial spillable data structure
Date Mon, 22 Aug 2016 17:38:20 GMT

    [ https://issues.apache.org/jira/browse/APEXMALHAR-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431248#comment-15431248

bright chen commented on APEXMALHAR-2190:

Suppose for window for memory management:
The data could be reset would be window by window instead of whole data. the support of window
could be management by outside code, for example, each window related to one instance of Block
or BlockStream. More convenient is add function for support window.

> Use shared memory to serial spillable data structure
> ----------------------------------------------------
>                 Key: APEXMALHAR-2190
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2190
>             Project: Apache Apex Malhar
>          Issue Type: Task
>            Reporter: bright chen
>            Assignee: bright chen
>   Original Estimate: 240h
>  Remaining Estimate: 240h
> Spillable Data Structure created lots of temporary memory to serial data lot of of memory
copy( see SliceUtils.concatenate(byte[], byte[]). Which used up memory very quickly. See APEXMALHAR-2182.
> Use a shared memory to avoid allocate temporary memory and memory copy
> some basic ideas
> - SerToLVBuffer interface provides a method serTo(T object, LengthValueBuffer buffer):
instead of create a memory and then return the serialized data, this method let the caller
pass in the buffer. So different objects or object with embed objects can share the same LengthValueBuffer
> - LengthValueBuffer: It is a buffer which manage the memory as length and value(which
is the generic format of serialized data). which provide length placeholder mechanism to avoid
temporary memory and data copy when the length can be know after data serialized
> - memory management classes: includes interface ByteStream and it's implementations:
Block, FixedBlock, BlocksStream. Which provides a mechanism to dynamic allocate and manage
memory. Which basically provides following function. I tried other some other stream mechamism
such as ByteArrayInputStream, but it can meet 3rd criteria, and don't have good performance(50%
>   - dynamic allocate memory
>   - reset memory for reuse
>   - BlocksStream make sure the output slices will not be changed when need extra memory;
Block can change the reference of output slices buffer is data was moved due to reallocate
of memory(BlocksStream is better solution).

This message was sent by Atlassian JIRA

View raw message