incubator-giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jakob Homan (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (GIRAPH-64) Create VertexRunner to make it easier to run users' computations
Date Tue, 01 Nov 2011 02:00:32 GMT

     [ https://issues.apache.org/jira/browse/GIRAPH-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jakob Homan updated GIRAPH-64:
------------------------------

    Attachment: GIRAPH-64.patch

Here's a patch that introduces that old bin folder we all know and lo{ve|athe}.  This also
gives us the start of the package we'll need to think about making releases.  Users no longer
have to merge their code into the Giraph source to get it to run.
With the new bin/giraph, assuming an implementation of Vertex such as (taken from the pagerankbenchmark,
obviously):
{code}import java.util.Iterator;

public class FirstVertex extends
    Vertex<LongWritable, DoubleWritable, DoubleWritable, DoubleWritable> {
    /** Configuration from Configurable */
    private Configuration conf;

    /** How many supersteps to run */
    public static String SUPERSTEP_COUNT = "PageRankBenchmark.superstepCount";

    @Override
    public void preApplication()
        throws InstantiationException, IllegalAccessException {
    }

    @Override
    public void postApplication() {
    }

    @Override
    public void preSuperstep() {
    }

    @Override
    public void compute(Iterator<DoubleWritable> msgIterator) {
        if (getSuperstep() >= 1) {
            double sum = 0;
            while (msgIterator.hasNext()) {
                sum += msgIterator.next().get();
            }
            DoubleWritable vertexValue =
                new DoubleWritable((0.15f / getNumVertices()) + 0.85f * sum);
            setVertexValue(vertexValue);
        }

        if (getSuperstep() < getConf().getInt(SUPERSTEP_COUNT, -1)) {
            long edges = getNumOutEdges();
            sendMsgToAllEdges(new DoubleWritable(getVertexValue().get() / edges));
        } else {
            voteToHalt();
        }
    }

  @Override
  public Configuration getConf() {
      return conf;
  }

  @Override
  public void setConf(Configuration conf) {
      this.conf = conf;
  }

}{code}
one can run it via:
{noformat}bin/giraph \
-DPageRankBenchmark.superstepCount=30 \
-DpseduoRandomVertexReader.aggregateVertices=220 \
-DpseduoRandomVertexReader.edgesPerVertex=37 \
~/kick-ass-vertex-1.0.jar giraph1.FirstVertex \
-w 10 \
-if org.apache.giraph.benchmark.PseudoRandomVertexInputFormat \
-of org.apache.giraph.lib.JsonBase64VertexOutputFormat \
-op output_path{noformat}
bin/giraph is heavily cribbed from mahout and pig, btw.  
Is there any reason the fatjar approach was taken other than expediency?  This patch uses
the fatjar approach for testing, but uses a standard lib folder approach for the actual package.
 I'd like to remove the fatjar entirely, eventually.

This is a rough script and will need lots of enhancements as we go, but I think it's a good
start.
                
> Create VertexRunner to make it easier to run users' computations
> ----------------------------------------------------------------
>
>                 Key: GIRAPH-64
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-64
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: GIRAPH-64.patch
>
>
> Currently, if a user wants to implement a Giraph algorithm by extending {{Vertex}} they
must also write all the boilerplate around the {{Tool}} interface and bundle it with the Giraph
jar (or get Giraph on the classpath and playing nice with the implementation).  For example,
what is included in the PageRankBenchmark and what Kohei has done: https://github.com/smly/java-Giraph-LabelPropagation
 It would be better if we had perhaps a Vertex implementation to be subclassed that already
had all the standard Tooling included such that all one had to run would be (assuming the
Giraph jar was already on the classpath):
> {noformat}hadoop jar my-awesome-vertex.jar my.awesome.vertex -i jazz_input -o jazz_output
-if org.apache.giraph.lib.in.text.adjacency-list.LongDoubleDouble -of org.apache.giraph.lib.out.text.adjacency-list.LongDoubleDouble{noformat}
This wouldn't work with every algorithm, but would be useful in a large number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message