hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-783) Efficient InMemory Storage for Vertices
Date Fri, 03 Jan 2014 07:17:53 GMT

    [ https://issues.apache.org/jira/browse/HAMA-783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861281#comment-13861281

Edward J. Yoon commented on HAMA-783:

I've committed my changes.

Next step:
If we can sort the partition file by vertexID before load into VerticesInfo, we can get rid
of V keyset and enable the DIskVerticesInfo.

Putting this function to Partitioning phase is somewhat heavy task. So my idea is use of external
sort. 1) Divide partition file into smaller files, 2) sort, re-write, 3) and merge the result.
I'll check whether this helps memory usage.

> Efficient InMemory Storage for Vertices
> ---------------------------------------
>                 Key: HAMA-783
>                 URL: https://issues.apache.org/jira/browse/HAMA-783
>             Project: Hama
>          Issue Type: Improvement
>          Components: graph
>            Reporter: Edward J. Yoon
>             Fix For: 0.7.0
>         Attachments: patch.txt
> Currently there are ListVerticesInfo, DiskVerticesInfo and DirectMemory, but I personally
think we have to do a big re-design of the vertices storage and graph job runner. 
> Actually, the size of split is not so great. Maybe, 60 ~ 200MB. Hence, I don't think
DiskVerticesInfo will be really helpful. Instead, we can use the Serialization like Spark.
> Update:
> 1) We also need to consider the checkpointing for Fault Tolerance, periodically.
> 2) If DiskVerticesInfo shows good performance, we can use just the DiskVerticesInfo.

This message was sent by Atlassian JIRA

View raw message