giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nitay Joffe" <ni...@apache.org>
Subject Re: Review Request: Allow creation of graph by adding edges that span multiple workers
Date Wed, 31 Oct 2012 20:43:27 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7784/#review12969
-----------------------------------------------------------


This is really cool stuff Alessandro. Logic and design wise it looks sweet to me.
The main comments I have are around too much copy/pasting. There are many places where I think
the copy/paste can be cleaned up fairly easily. See detailed comments for examples.



/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java
<https://reviews.apache.org/r/7784/#comment27903>

    fits on one line?



/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java
<https://reviews.apache.org/r/7784/#comment27902>

    fits on one line?



/trunk/giraph/src/main/java/org/apache/giraph/graph/BspService.java
<https://reviews.apache.org/r/7784/#comment27904>

    I see a lot of duplication here. If these four and the previous four paths for vertexes
are going to be kept around for the long term (not just some intermediate thing while we're
developing quickly), how about we stick them in a class? Then have one instance for vertexes
and one for edges? Something like zkInputSplitPaths?



/trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceMaster.java
<https://reviews.apache.org/r/7784/#comment27905>

    Let's not be lazy. Make a common function with generateVertexInputSplits() and call it
here. Copy/paste is bad form and has bitten us many times before.



/trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceMaster.java
<https://reviews.apache.org/r/7784/#comment27906>

    as far as i can tell this is the only line that actually throws anything. Can we make
the try/catch be only around this and do everything else outside of it?



/trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceMaster.java
<https://reviews.apache.org/r/7784/#comment27907>

    can we make a GiraphConfiguration.isUsingVertexInputFormat() or something? This null check
on a class is not necessarily intuitive to an incoming user?



/trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceMaster.java
<https://reviews.apache.org/r/7784/#comment27909>

    Again let's avoid copy/paste. I don't think it is good form to check in code like this
with notions of fixing it "later". It creates clutter for everyone else, raises the barrier
for anyone looking to jump into the codebase, and rarely gets fully cleaned up.
     
    I think if you create my zkInputSplitPaths helper class it should be much easier to consolidate
this with createVertexInputSplits() above?



/trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceMaster.java
<https://reviews.apache.org/r/7784/#comment27908>

    similar for this, isUsingEdgeInput or something?



/trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceMaster.java
<https://reviews.apache.org/r/7784/#comment27910>

    again the isUsingXXInputSplit() whatnot



/trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceMaster.java
<https://reviews.apache.org/r/7784/#comment27911>

    This one path string is literally the _only_ difference from this and WriteVertexInputSplit
as far as I can tell. Make it a member variable and use one class please instead of copying
>50 lines of copy.



/trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
<https://reviews.apache.org/r/7784/#comment27912>

    Oh god, we really should not need to use this. It is a sign of bad code. Please at the
very least open up a JIRA to clean up this setup() call asap or something. If checkstyle is
throwing errors then this method must really be way too long. Can we chop it up into the five
steps described in the comment (assuming that comment still valid)?
    
    For most programmers once a method goes over say ~150-200 lines and doesn't fit on one
screen it becomes very hard to keep the context in your head.



/trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
<https://reviews.apache.org/r/7784/#comment27913>

    isUsingXXInputSplit whatnot



/trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
<https://reviews.apache.org/r/7784/#comment27914>

    for example this whole thing could be a function, ensureEdgeInputSplitsReady() or something.
Also again it should be consolidated with vertex version above.



/trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
<https://reviews.apache.org/r/7784/#comment27915>

    waitForOtherEdgeInputSplits() or something?
    consolidate with Vertex version.



/trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
<https://reviews.apache.org/r/7784/#comment27916>

    nice :)



/trunk/giraph/src/main/java/org/apache/giraph/graph/EdgeInputSplitsCallable.java
<https://reviews.apache.org/r/7784/#comment27917>

    I realize I may be oversimplifying things, but it seems to me this function is really
the main / only difference between this class and VertexInputSplitCallable? How about making
a base class they both inherit from with all the common pieces?



/trunk/giraph/src/main/java/org/apache/giraph/graph/EdgeReader.java
<https://reviews.apache.org/r/7784/#comment27918>

    s/vertices/edges ?



/trunk/giraph/src/main/java/org/apache/giraph/graph/EdgeWithSource.java
<https://reviews.apache.org/r/7784/#comment27919>

    Neat and clean, I like it :)



/trunk/giraph/src/main/java/org/apache/giraph/graph/VertexInputSplitsCallable.java
<https://reviews.apache.org/r/7784/#comment27920>

    I'm assuming this is same as InputSplitsCallable as before right?



/trunk/giraph/src/main/java/org/apache/giraph/io/GiraphFileInputFormat.java
<https://reviews.apache.org/r/7784/#comment27921>

    As far as I can tell this is the only real difference between this function and listEdgeStatus().
Please clean this up with a common function.



/trunk/giraph/src/main/java/org/apache/giraph/io/GiraphFileInputFormat.java
<https://reviews.apache.org/r/7784/#comment27922>

    Again this line looks like the only real difference between this function and getEdgeSplits().
Please clean this up as well.



/trunk/giraph/src/main/java/org/apache/giraph/io/IntIntTextVertexValueInputFormat.java
<https://reviews.apache.org/r/7784/#comment27924>

    Can we change Integer[] to some IntPair type thing? Should be more efficient too



/trunk/giraph/src/main/java/org/apache/giraph/io/IntIntTextVertexValueInputFormat.java
<https://reviews.apache.org/r/7784/#comment27923>

    Guava's stuff might actually be faster / better for this? Splitter TAB_SPLITTER = Splitter.on("\t")
up top, then
    TAB_SPLITTER.split(line.toString()) here?



/trunk/giraph/src/main/java/org/apache/giraph/io/IntNullTextEdgeInputFormat.java
<https://reviews.apache.org/r/7784/#comment27926>

    similarly, Guava Splitter?



/trunk/giraph/src/main/java/org/apache/giraph/io/IntNullTextEdgeInputFormat.java
<https://reviews.apache.org/r/7784/#comment27925>

    Similarly, IntPair?


- Nitay Joffe


On Oct. 30, 2012, 11:44 p.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7784/
> -----------------------------------------------------------
> 
> (Updated Oct. 30, 2012, 11:44 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> See https://issues.apache.org/jira/browse/GIRAPH-155
> 
> 
> This addresses bug GIRAPH-155.
>     https://issues.apache.org/jira/browse/GIRAPH-155
> 
> 
> Diffs
> -----
> 
>   /trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java
1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceMaster.java 1403451

>   /trunk/giraph/src/main/java/org/apache/giraph/comm/WorkerServer.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClientRequestProcessor.java
1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java 1403451

>   /trunk/giraph/src/main/java/org/apache/giraph/comm/requests/SendPartitionMutationsRequest.java
1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/BasicVertexValueReader.java PRE-CREATION

>   /trunk/giraph/src/main/java/org/apache/giraph/graph/BspService.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceMaster.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/BspUtils.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/EdgeInputFormat.java PRE-CREATION

>   /trunk/giraph/src/main/java/org/apache/giraph/graph/EdgeInputSplitsCallable.java PRE-CREATION

>   /trunk/giraph/src/main/java/org/apache/giraph/graph/EdgeReader.java PRE-CREATION 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/EdgeWithSource.java PRE-CREATION

>   /trunk/giraph/src/main/java/org/apache/giraph/graph/GiraphJob.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/GraphMapper.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/InputSplitsCallable.java 1403451

>   /trunk/giraph/src/main/java/org/apache/giraph/graph/MasterThread.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/VertexInputSplitsCallable.java
PRE-CREATION 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/VertexReader.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/VertexResolver.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/VertexValueInputFormat.java PRE-CREATION

>   /trunk/giraph/src/main/java/org/apache/giraph/graph/VertexValueReader.java PRE-CREATION

>   /trunk/giraph/src/main/java/org/apache/giraph/io/GiraphFileInputFormat.java PRE-CREATION

>   /trunk/giraph/src/main/java/org/apache/giraph/io/GiraphTextInputFormat.java PRE-CREATION

>   /trunk/giraph/src/main/java/org/apache/giraph/io/IntIntTextVertexValueInputFormat.java
PRE-CREATION 
>   /trunk/giraph/src/main/java/org/apache/giraph/io/IntNullTextEdgeInputFormat.java PRE-CREATION

>   /trunk/giraph/src/main/java/org/apache/giraph/io/TextEdgeInputFormat.java PRE-CREATION

>   /trunk/giraph/src/main/java/org/apache/giraph/io/TextVertexInputFormat.java 1403451

>   /trunk/giraph/src/main/java/org/apache/giraph/io/TextVertexValueInputFormat.java PRE-CREATION

>   /trunk/giraph/src/main/java/org/apache/giraph/utils/FileUtils.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1403451

>   /trunk/giraph/src/test/java/org/apache/giraph/BspCase.java 1403451 
>   /trunk/giraph/src/test/java/org/apache/giraph/TestEdgeInput.java PRE-CREATION 
>   /trunk/giraph/src/test/java/org/apache/giraph/TestJsonBase64Format.java 1403451 
> 
> Diff: https://reviews.apache.org/r/7784/diff/
> 
> 
> Testing
> -------
> 
> 1) mvn verify
> 2) pseudo-distributed tests
> 3) PageRank on cluster with real text input (will post perf results on the issue)
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message