hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suraj Menon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-559) Add a spilling message queue
Date Tue, 30 Oct 2012 12:34:12 GMT

    [ https://issues.apache.org/jira/browse/HAMA-559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486833#comment-13486833
] 

Suraj Menon commented on HAMA-559:
----------------------------------

Hi, that was a nice catch. I found out that I am doing an extra buffer copy than needed. I
see that spilling buffer is giving better performance but only sometimes. Currently, investigating
why so. I implemented a synchronous disk queue without a spilling thread. Here are the performance
numbers for now. I added a case for 10 million integers in your benchmark code. I am putting
the numbers for both scenarios. I am trying to find out what is changing the numbers so drastically
on every benchmark run and this is not for just spilling buffer.

{noformat}

    size            type        us linear runtime
 1000000       DISK_LIST 221546.22 ==============
 1000000 SPILLING_BUFFER 118403.87 =======
 1000000     DISK_BUFFER  40151.49 ==
10000000       DISK_LIST 473334.31 ==============================
10000000 SPILLING_BUFFER 360539.53 ======================
10000000     DISK_BUFFER 389689.06 ========================

vm: java
trial: 0
benchmark: Spill

{noformat}


The one with bad performance:
{noformat}

    size            type        us linear runtime
 1000000       DISK_LIST   38550.9 =
 1000000 SPILLING_BUFFER  140961.3 ===
 1000000     DISK_BUFFER   44809.6 =
10000000       DISK_LIST  340909.8 =========
10000000 SPILLING_BUFFER 1116445.2 ==============================
10000000     DISK_BUFFER  374593.0 ==========


{noformat}

                
> Add a spilling message queue
> ----------------------------
>
>                 Key: HAMA-559
>                 URL: https://issues.apache.org/jira/browse/HAMA-559
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>            Assignee: Suraj Menon
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: HAMA-559.patch-v1, spilling_buffer_cpu_usage_text_write.png, SpillingBufferProfile-2012-10-27.snapshot,
spilling_buffer_profile_cpu_graph_test_write.png, spilling_buffer_profile_cpugraph_writeUTF.png,
spillingbuffer_profile_cpu_writeUTF.png, spilling_buffer_profile_LOCK.JPG, spilling_buffer_profile_timesplit_text_write.png,
spilling_buffer_profile_writeUTF.png
>
>
> After HAMA-521 is done, we can add a spilling queue which just holds the messages in
RAM that fit into the heap space. The rest can be flushed to disk.
> We may call this a HybridQueue or something like that.
> The benefits should be that we don't have to flush to disk so often and get faster. However
we may have more GC so it is always overall faster.
> The requirements for this queue also include:
> - The message object once written to the queue (after returning from the write call)
could be modified, but the changes should not be reflected in the messages stored in the queue.
> - For now let's implement a queue that does not support concurrent reading and writing.
This feature is needed when we implement asynchronous communication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message