incubator-giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
Date Thu, 19 Apr 2012 07:44:09 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13257342#comment-13257342
] 

jiraposter@reviews.apache.org commented on GIRAPH-153:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4801/#review7020
-----------------------------------------------------------


Brian, this kind of stuff will really help lower the bar for using Giraph.  Great work!  Probably
my main comment is that we should get this code to use the same checkstyle.xml as the parent
pom.xml.  Making that change will likely take you a little bit of time.  Once that gets worked
out, we should get this committed.  I definitely intend to use your same approach to adding
the Hive formats as well.

In response to your comments.

1) Users must 'mvn install' the giraph artifact in their local repo, at least until we get
something posted on maven central.

Agreed.  This isn't too bad, but as you mentioned, a README would be useful.  Or you could
edit https://cwiki.apache.org/confluence/display/GIRAPH/, which would be possibly easier than
maintaining documentation here.

2) I modified the pom.xml to exclude the artifact from the rat plugin. I realize this is less
than desirable, but I couldn't get anything running
despite numerous attempts at fixing the "too many unapproved licenses" issues. I'm interested
to hear your guys thoughts.

I'm confused why there were too many unapproved licenses.  This is probably needed though,
this the code has to have the Apache license I believe.

3) Duplicate BspCase in my submodule, at least until Giraph has a test artifact.

Yeah, this sucks, but at least you noted it in the comment.

4) Initializing the AccumuloVertexInputFormat has some procedural limitations inherent in
the format design when run with the GiraphJob. It really expects to have control of the Job
instance. These can be difficult to track down. I tried to document these in my unit tests
and provide some simple error wrapping to help notify users when they see these.

Not an Accumulo expert, so.....okay. =)

5) No README.txt or any wiki entry yet. I figured I'd wait and see what feedback you guys
had.

See my response to 1).


http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/pom.xml
<https://reviews.apache.org/r/4801/#comment15593>

    Brian, I've noticed that there is no usage of the checkstyle plugin, like the parent pom.xml
uses (i.e. 
    
          <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-checkstyle-plugin</artifactId>
            <version>2.9</version>
            <configuration>
              <configLocation>checkstyle.xml</configLocation>
              <enableRulesSummary>false</enableRulesSummary>
              <headerLocation>license-header.txt</headerLocation>
    	      <failOnError>true</failOnError>
    	      <includeTestSourceDirectory>false</includeTestSourceDirectory>
            </configuration>
            <executions>
              <execution>
                <phase>verify</phase>
                <goals>
                  <goal>check</goal>
                </goals>
              </execution>
            </executions>
          </plugin>
    
    This would be good to have code uniformity.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/accumulo/AccumuloVertexInputFormat.java
<https://reviews.apache.org/r/4801/#comment15594>

    I'm specifying a few examples where the checkstyle.xml will catch you.  Here you would
need to add javadoc for your members.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/accumulo/AccumuloVertexInputFormat.java
<https://reviews.apache.org/r/4801/#comment15595>

    @Override



http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/accumulo/AccumuloVertexInputFormat.java
<https://reviews.apache.org/r/4801/#comment15596>

    Extra *



http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/accumulo/AccumuloVertexInputFormat.java
<https://reviews.apache.org/r/4801/#comment15597>

    javadoc



http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/accumulo/AccumuloVertexInputFormat.java
<https://reviews.apache.org/r/4801/#comment15598>

    @Override



http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/accumulo/AccumuloVertexInputFormat.java
<https://reviews.apache.org/r/4801/#comment15599>

    if (e



http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/test/java/org/apache/giraph/format/accumulo/TestAccumuloVertexFormat.java
<https://reviews.apache.org/r/4801/#comment15600>

    extra lines



http://svn.apache.org/repos/asf/incubator/giraph/trunk/pom.xml
<https://reviews.apache.org/r/4801/#comment15592>

    Would be great to get Apache Rat working with giraph-formats-contrib so that we don't
forget any licenses here.


- Avery


On 2012-04-19 07:27:14, Avery Ching wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4801/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-04-19 07:27:14)
bq.  
bq.  
bq.  Review request for giraph.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Brian's patch for GIRAPH-153.
bq.  
bq.  
bq.  This addresses bug GIRAPH-153.
bq.      https://issues.apache.org/jira/browse/GIRAPH-153
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/LICENSE.txt
PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/license-header.txt
PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/pom.xml
PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/assembly/compile.xml
PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/accumulo/AccumuloVertexInputFormat.java
PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/accumulo/AccumuloVertexOutputFormat.java
PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/hbase/HBaseVertexInputFormat.java
PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/hbase/HBaseVertexOutputFormat.java
PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/test/java/org/apache/giraph/BspCase.java
PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/test/java/org/apache/giraph/format/accumulo/TestAccumuloVertexFormat.java
PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/test/java/org/apache/giraph/format/accumulo/edgemarker/AccumuloEdgeInputFormat.java
PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/test/java/org/apache/giraph/format/accumulo/edgemarker/AccumuloEdgeOutputFormat.java
PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/test/java/org/apache/giraph/format/hbase/TestHBaseRootMarkerVertextFormat.java
PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/test/java/org/apache/giraph/format/hbase/edgemarker/TableEdgeInputFormat.java
PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/giraph-formats-contrib/src/test/java/org/apache/giraph/format/hbase/edgemarker/TableEdgeOutputFormat.java
PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/pom.xml 1327637 
bq.  
bq.  Diff: https://reviews.apache.org/r/4801/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Avery
bq.  
bq.


                
> HBase/Accumulo Input and Output formats
> ---------------------------------------
>
>                 Key: GIRAPH-153
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-153
>             Project: Giraph
>          Issue Type: New Feature
>          Components: bsp
>    Affects Versions: 0.1.0
>         Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
>            Reporter: Brian Femiano
>         Attachments: GIRAPH-153.patch
>
>
> Four abstract classes that wrap their respective delegate input/output formats for
> easy hooks into vertex input format subclasses. I've included some sample programs that
show two very simple graph
> algorithms. I have a graph generator that builds out a very simple directed structure,
starting with a few 'root' nodes.
> Root nodes are defined as nodes which are not listed as a child anywhere in the graph.

> Algorithm 1) AccumuloRootMarker.java  --> Accumulo as read/write source. Every vertex
starts thinking it's a root. At superstep 0, send a message down to each
> child as a non-root notification. After superstep 1, only root nodes will have never
been messaged. 
> Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by bundling
the notification logic followed by root node propagation. Once we've marked the appropriate
nodes as roots, tell every child which roots it can be traced back to via one or more spanning
trees. This will take N + 2 supersteps where N is the maximum number of hops from any root
to any leaf, plus 2 supersteps for the initial root flagging. 
> I've included all relevant code plus DistributedCacheHelper.java for recursive cache
file and archive searches. It is more hadoop centric than giraph, but these jobs use it so
I figured why not commit here. 
> These have been tested through local JobRunner, pseudo-distributed on the aforementioned
hardware, and full distributed on EC2. More details in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message