giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-26) Improve PseudoRandomVertexInputFormat to create a more realistic synthetic graph (e.g. power-law distributed vertex-cardinality).
Date Tue, 21 Aug 2012 01:27:38 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438378#comment-13438378
] 

Eli Reisman commented on GIRAPH-26:
-----------------------------------

This is looking really great, nice work.

- I'd like to go over the nature of your solution to making sure each worker processes a unique
part of the graph (was using the SplitIndex id's good enough for your purposes, or do you
still require each worker using a range of vertexID's to process?) do you have other requirements
on your wish list as far as guaranteeing one worker to process each "virtual input split"
as you indicated before?

- this is seriously mathematical stuff, so var names like "lowDecisionBoundary" and "upperEdgeRatio"
are fantastic. Maybe replace names like "tempP1" and "highCurrCumsum" and "array" with something
long and annoying and super easy to read/understand. Sorry. Go easy us, most of us have public
school educations. ;)

- There are a couple typos in the comments. In general, maybe slap a Javadoc comment on every
method even the Overrides since you're doing nonstandard stuff here, and include @param and
@return tags for all. Don't be afraid to throw in a few more inline comments in the methods,
just a one-liner here and there to give the reader a heads up about each major step in the
algorithm code.

- not sure whats up with the "procID" as a random seed, if you need a different seed per-worker
you can probably dig the hostname/port combo up and hash them from where your code sits in
the framework. Let me know if you're curious about this option.

- {insert brilliant math review here...}

- I'm sorry in advance: what about...a unit test? Again, let me know if you need a leg up
on this. If someone feels comfortable giving the math a thumbs up without an included test
case, we're probably good here.

- If Jakob were here, he'd say "don't delete old patches when you put up a new one." Of course,
I've zapped a few too :) so I can't say it.

Great work, Sean. Impressive stuff. Congrats again on the paper, too!
                
> Improve PseudoRandomVertexInputFormat to create a more realistic synthetic graph (e.g.
power-law distributed vertex-cardinality).
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-26
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-26
>             Project: Giraph
>          Issue Type: Test
>          Components: benchmark
>    Affects Versions: 0.2.0
>            Reporter: Jake Mannix
>            Assignee: Sean Choi
>            Priority: Minor
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-26.patch
>
>
> The PageRankBenchmark class, to be a proper benchmark, should run over graphs which look
more like data seen in the wild, and web link graphs, social network graphs, and text corpora
(represented as a bipartite graph) all have power-law distributions, so benchmarking a synthetic
graph which looks more like this would be a nice test which would stress cases of uneven split-distribution
and bottlenecks of subclusters of the graph of heavily connected vertices.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message