giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-26) Improve PseudoRandomVertexInputFormat to create a more realistic synthetic graph (e.g. power-law distributed vertex-cardinality).
Date Tue, 07 Aug 2012 00:33:02 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429618#comment-13429618
] 

Eli Reisman commented on GIRAPH-26:
-----------------------------------

Sean, nice work. Some things to think about:

to get your input seed matrix to this new input format, try adding some constants to GiraphJob.java
in the graph/ directory.

/** Allows a seed matrix to be entered at the command line in the format:
 * [0.1 0.2 0.3 seans.format listed.here etc... ]
 */
public final String KROENECKER_SEED_MATRIX = "giraph.kroenecker.seed";

/** A default setting for KROENECKER_SEED_MATRIX if no command line argument is supplied */
public final String KROENECKER_SEED_MATRIX_DEFAULT = "0.5 11.1 .33 .more .clever .numbers
.here";

then inside your code, you can check for these constants stored in the Configuration and the
get methods will allow you to substitute the default where you have "" if no default is entered.
This also prevents having to hardcode the defaults inside your IO format itself but colocate
them with the other defaults where new users can review all the options at one. Soon there
will be a specific class for this, GiraphConf but for now GiraphJob is the place to put it.
GiraphRunner and the framework will ensure if someone enters data under "giraph.kroenecker.seed"
that it will end up in the Configuration object, ready for you to pull out, using the technique
your code already employs.

I want to ask a few more things but I will wait for the updated patch. This will be super
useful to all of us for testing our code, thanks!

                
> Improve PseudoRandomVertexInputFormat to create a more realistic synthetic graph (e.g.
power-law distributed vertex-cardinality).
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-26
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-26
>             Project: Giraph
>          Issue Type: Test
>          Components: benchmark
>    Affects Versions: 0.2.0
>            Reporter: Jake Mannix
>            Assignee: Sean Choi
>            Priority: Minor
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-26-1.patch
>
>
> The PageRankBenchmark class, to be a proper benchmark, should run over graphs which look
more like data seen in the wild, and web link graphs, social network graphs, and text corpora
(represented as a bipartite graph) all have power-law distributions, so benchmarking a synthetic
graph which looks more like this would be a nice test which would stress cases of uneven split-distribution
and bottlenecks of subclusters of the graph of heavily connected vertices.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message