giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-26) Improve PseudoRandomVertexInputFormat to create a more realistic synthetic graph (e.g. power-law distributed vertex-cardinality).
Date Wed, 08 Aug 2012 21:59:21 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431437#comment-13431437
] 

Eli Reisman commented on GIRAPH-26:
-----------------------------------

Sounds great. New word on high is, for apps keep the constant defs in your IO or vertex class
rather than GiraphJob and then just eliminate the run() method and that sort of stuff. If
this is all old news for the new patch, then disregard. Really looking forward to the patch.

You mentioned that it generates random edge weight data as well as edges themselves. I wonder,
are these just Random() values or are they generated with a particular distribution such as
the way your edges are generated? if they are just straight up random, could we have configs
for those too? Like someone might want to do shortest paths without negative edge weights,
some might have other value needs for their algorithm tests (values all greater than 100 or
less than 1000 etc.) Just curious not a big deal.

                
> Improve PseudoRandomVertexInputFormat to create a more realistic synthetic graph (e.g.
power-law distributed vertex-cardinality).
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-26
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-26
>             Project: Giraph
>          Issue Type: Test
>          Components: benchmark
>    Affects Versions: 0.2.0
>            Reporter: Jake Mannix
>            Assignee: Sean Choi
>            Priority: Minor
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-26-1.patch
>
>
> The PageRankBenchmark class, to be a proper benchmark, should run over graphs which look
more like data seen in the wild, and web link graphs, social network graphs, and text corpora
(represented as a bipartite graph) all have power-law distributions, so benchmarking a synthetic
graph which looks more like this would be a nice test which would stress cases of uneven split-distribution
and bottlenecks of subclusters of the graph of heavily connected vertices.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message