hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-642) Make GraphRunner disk based
Date Thu, 18 Oct 2012 01:22:03 GMT

    [ https://issues.apache.org/jira/browse/HAMA-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478564#comment-13478564
] 

Edward J. Yoon commented on HAMA-642:
-------------------------------------

I just tested both turnk version and patch applied version again. This patch is still unstable.

 - Input size: 1167771205 
 - Tasks per node: 10
 - Physical nodes: 18
 - bsp.child.java.opts: 4GB

Job (patch applied version) always fails without specific error message.

{code}
attempt_201210171757_0001_000047_0, attempt_201210171757_0001_000134_0, attempt_201210171757_0001_000062_0,
attempt_201210171757_0001_000149_0, attempt_201210171757_0001_000046_0, attempt_201210171757_0001_000117_0,
attempt_201210171757_0001_000041_0, attempt_201210171757_0001_000136_0, attempt_201210171757_0001_000140_0,
attempt_201210171757_0001_000098_0, attempt_201210171757_0001_000103_0, attempt_201210171757_0001_000120_0,
attempt_201210171757_0001_000036_0, attempt_201210171757_0001_000099_0, attempt_201210171757_0001_000045_0,
attempt_201210171757_0001_000066_0, attempt_201210171757_0001_000160_0]
12/10/17 17:57:15 DEBUG sync.ZooKeeperSyncClientImpl: leaveBarrier(): superstep:0 taskid:attempt_201210171757_0001_000080_0
wait for lowest notify.
12/10/17 17:57:15 DEBUG sync.ZooKeeperSyncClientImpl: leaveBarrier() at superstep: 0 taskid:attempt_201210171757_0001_000080_0
lowest notify other nodes.
12/10/17 17:57:15 DEBUG sync.ZooKeeperSyncClientImpl: leaveBarrier() !!! checking znodes contnains
/ready node or not: at superstep:0 znode:[ready]
12/10/17 17:57:15 DEBUG sync.ZooKeeperSyncClientImpl: leaveBarrier() at superstep:0 znode
size: (0) znodes:[]
12/10/17 17:57:15 DEBUG bsp.Counters: Adding TIME_IN_SYNC_MS
12/10/17 17:57:15 DEBUG message.AbstractMessageManager: Creating new class org.apache.hama.bsp.message.MemoryQueue
12/10/17 17:57:15 INFO ipc.Server: Stopping server on 61004
12/10/17 17:57:15 INFO ipc.Server: IPC Server handler 0 on 61004: exiting
12/10/17 17:57:15 INFO ipc.Server: Stopping IPC Server listener on 61004
12/10/17 17:57:15 INFO ipc.Server: Stopping IPC Server Responder
12/10/17 17:57:15 ERROR bsp.BSPTask: Shutting down ping service.
{code}
                
> Make GraphRunner disk based
> ---------------------------
>
>                 Key: HAMA-642
>                 URL: https://issues.apache.org/jira/browse/HAMA-642
>             Project: Hama
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>            Assignee: Thomas Jungblut
>         Attachments: HAMA-642_unix_1.patch, HAMA-642_unix_2.patch, HAMA-642_unix_3.patch,
HAMA-scale_1.patch, HAMA-scale_2.patch, HAMA-scale_3.patch, HAMA-scale_4.patch
>
>
> To improve scalability we can improve the graph runner to be disk based.
> Which basically means:
> - We have just a single Vertex instance that get's refilled.
> - We directly write vertices to disk after partitioning
> - In every superstep we iterate over the vertices on disk, fill the vertex instance and
call the users compute functions
> Problems:
> - State other than vertex value can't be stored easy
> - How do we deal with random access after messages have arrived?
> So I think we should make the graph runner more hybrid, like using the queues we have
implemented in the messaging. So the graphrunner can be configured to run completely on disk,
in cached mode or in in-memory mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message