incubator-giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Claudio Martella (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-96) Support for Graphs with Huge adjacency lists
Date Thu, 17 Nov 2011 20:17:54 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-96?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152291#comment-13152291
] 

Claudio Martella commented on GIRAPH-96:
----------------------------------------

it is indeed a nice discussion, the amount of data to be read is the same after all, but we're
talking about random i/o here. It's a possibility. Also, answering to Gianmarco, the idea
of having an HBase InputReader is the same as the current discussion on supporting Hive, Pig
and HCatalog. If you store your data in HBase it can be quite useful, as much as it is not
for MR. The lazy-approach could be something to investigate  and anyway something that would
be necessary only with huge graphs or, as in my case, where we have computations that don't
necessarily touch the whole graph.

Good out-of-core data structures/maps are difficult to find around, maybe linkedin's krati
or leveldb (but i guess we'd have license issues there).
                
> Support for Graphs with Huge adjacency lists
> --------------------------------------------
>
>                 Key: GIRAPH-96
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-96
>             Project: Giraph
>          Issue Type: Improvement
>          Components: bsp
>    Affects Versions: 0.70.0
>            Reporter: Arun Suresh
>
> Currently the vertex initialize() method is passed the complete adjacency list as a HashMap.
All the current concrete implementations of Vertex iterate over the adjacency list and recreate
new Data Structures within the Vertex instance to hold/manipulate the adjacency list. This
would seize to be feasible once the size of the adjacency list becomes really huge.
> I propose storing the adjacency list and all vertex information (and incoming messages
?) in a distributed data store such as HBase. The adjacency list can be lazily loaded via
HBase Scans. I was thinking of an HBase schema where the row Id is a concatenation of VertexID+OutboundVertexId
with a single column containing the edge.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message