tinkerpop-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From spmalle...@apache.org
Subject [2/3] incubator-tinkerpop git commit: broke apart the blvp docs to get them in their individual sections in implementations.
Date Fri, 16 Oct 2015 20:34:29 GMT
broke apart the blvp docs to get them in their individual sections in implementations.


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/dd178191
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/dd178191
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/dd178191

Branch: refs/heads/master
Commit: dd1781911d84f9bd711ebf92c1383ebfaef37b4c
Parents: e6267c7
Author: Stephen Mallette <spmva@genoprime.com>
Authored: Fri Oct 16 16:22:45 2015 -0400
Committer: Stephen Mallette <spmva@genoprime.com>
Committed: Fri Oct 16 16:22:45 2015 -0400

----------------------------------------------------------------------
 docs/src/implementations.asciidoc   | 147 +++++++++++++++++++++++++++++++
 docs/src/the-graphcomputer.asciidoc | 111 +++--------------------
 2 files changed, 158 insertions(+), 100 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/dd178191/docs/src/implementations.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/implementations.asciidoc b/docs/src/implementations.asciidoc
index c9395b2..565c467 100644
--- a/docs/src/implementations.asciidoc
+++ b/docs/src/implementations.asciidoc
@@ -714,6 +714,35 @@ graph.close()
 
 IMPORTANT: `LabelP.of()` is only required if multi-labels are leveraged. `LabelP.of()` is
used when filtering/looking-up vertices by their label(s) as the standard `P.eq()` does a
direct match on the `::`-representation of `vertex.label()`
 
+Loading with BulkLoaderVertexProgram
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The <<bulkloadervertexprogram, BulkLoaderVertexProgram>> is a generalized bulk
loader that can be used to load large amounts of data to and from Neo4j. The following code
demonstrates how to load the modern graph from TinkerGraph into Neo4j:
+
+[gremlin-groovy]
+----
+wgConf = 'conf/neo4j-standalone.properties'
+modern = TinkerFactory.createModern()
+blvp = BulkLoaderVertexProgram.build().
+           keepOriginalIds(false).
+           writeGraph(wgConf).create(modern)
+modern.compute().program(blvp).submit().get()
+graph = GraphFactory.open(wgConf)
+g = graph.traversal()
+g.V().valueMap()
+graph.close()
+----
+
+[source,properties]
+----
+# neo4j-standalone.properties
+
+gremlin.graph=org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph
+gremlin.neo4j.directory=/tmp/neo4j
+gremlin.neo4j.conf.node_auto_indexing=true
+gremlin.neo4j.conf.relationship_auto_indexing=true
+----
+
 [[hadoop-gremlin]]
 Hadoop-Gremlin
 --------------
@@ -884,6 +913,68 @@ result.memory.keys()
 result.memory.get('~reducing')
 ----
 
+Loading with BulkLoaderVertexProgram
+++++++++++++++++++++++++++++++++++++
+
+The <<bulkloadervertexprogram, BulkLoaderVertexProgram>> is a generalized bulk
loader that can be used to load large amounts of data to and from different `Graph` implementations.
The following code demonstrates how to load the Grateful Dead graph from HadoopGraph into
TinkerGraph over Giraph:
+
+[gremlin-groovy]
+----
+hdfs.copyFromLocal('data/grateful-dead.kryo', 'data/grateful-dead.kryo')
+wgConf = 'conf/tinkergraph-gryo.properties'
+grateful = GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties')
+blvp = BulkLoaderVertexProgram.build().
+           keepOriginalIds(false).
+           writeGraph(wgConf).create(grateful)
+grateful.compute(GiraphGraphComputer).program(blvp).submit().get()
+:set max-iteration 10
+graph = GraphFactory.open(wgConf)
+g = graph.traversal()
+g.V().valueMap()
+graph.close()
+----
+
+[source,properties]
+----
+# hadoop-grateful-gryo.properties
+
+#
+# Hadoop Graph Configuration
+#
+gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
+gremlin.hadoop.graphInputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat
+gremlin.hadoop.graphOutputFormat=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
+gremlin.hadoop.memoryOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
+gremlin.hadoop.inputLocation=data/grateful-dead.kryo
+gremlin.hadoop.outputLocation=output
+gremlin.hadoop.deriveMemory=false
+gremlin.hadoop.jarsInDistributedCache=true
+
+#
+# GiraphGraphComputer Configuration
+#
+giraph.minWorkers=1
+giraph.maxWorkers=1
+giraph.useOutOfCoreGraph=true
+giraph.useOutOfCoreMessages=true
+mapred.map.child.java.opts=-Xmx1024m
+mapred.reduce.child.java.opts=-Xmx1024m
+giraph.numInputThreads=4
+giraph.numComputeThreads=4
+giraph.maxMessagesInMemory=100000
+----
+
+[source,properties]
+----
+# tinkergraph-gryo.properties
+
+gremlin.graph=org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph
+gremlin.tinkergraph.graphFormat=gryo
+gremlin.tinkergraph.graphLocation=/tmp/tinkergraph.kryo
+----
+
+NOTE: The path to TinkerGraph needs to be included in the `HADOOP_GREMLIN_LIBS` for the above
example to work.
+
 [[sparkgraphcomputer]]
 SparkGraphComputer
 ^^^^^^^^^^^^^^^^^^
@@ -914,6 +1005,62 @@ image::spark-algorithm.png[width=775]
 
 IMPORTANT: If the vendor/user wishes to bypass using Hadoop `InputFormats` for pulling data
from the underlying graph system, it is possible to leverage Spark's RDD constructs directly.
There is a `gremlin.hadoop.graphInputRDD` configuration that references a `Class<? extends
InputRDD>`. An `InputRDD` provides a read method that takes a `SparkContext` and returns
a graphRDD. Likewise, to bypass `OutputFormat`, use `gremlin.hadoop.graphOutputRDD` and the
respective `OutputRDD` with its write-based method.
 
+Loading with BulkLoaderVertexProgram
+++++++++++++++++++++++++++++++++++++
+
+The <<bulkloadervertexprogram, BulkLoaderVertexProgram>> is a generalized bulk
loader that can be used to load large amounts of data to and from different `Graph` implementations.
The following code demonstrates how to load the Grateful Dead graph from HadoopGraph into
TinkerGraph over Spark:
+
+[gremlin-groovy]
+----
+hdfs.copyFromLocal('data/grateful-dead.kryo', 'data/grateful-dead.kryo')
+wgConf = 'conf/tinkergraph-gryo.properties'
+grateful = GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties')
+blvp = BulkLoaderVertexProgram.build().
+           keepOriginalIds(false).
+           writeGraph(wgConf).create(grateful)
+grateful.compute(SparkGraphComputer).program(blvp).submit().get()
+:set max-iteration 10
+graph = GraphFactory.open(wgConf)
+g = graph.traversal()
+g.V().valueMap()
+graph.close()
+----
+
+[source,properties]
+----
+# hadoop-grateful-gryo.properties
+
+#
+# Hadoop Graph Configuration
+#
+gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
+gremlin.hadoop.graphInputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat
+gremlin.hadoop.graphOutputFormat=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
+gremlin.hadoop.memoryOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
+gremlin.hadoop.inputLocation=data/grateful-dead.kryo
+gremlin.hadoop.outputLocation=output
+gremlin.hadoop.deriveMemory=false
+gremlin.hadoop.jarsInDistributedCache=true
+
+#
+# SparkGraphComputer Configuration
+#
+spark.master=local[1]
+spark.executor.memory=1g
+spark.serializer=org.apache.spark.serializer.KryoSerializer
+----
+
+[source,properties]
+----
+# tinkergraph-gryo.properties
+
+gremlin.graph=org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph
+gremlin.tinkergraph.graphFormat=gryo
+gremlin.tinkergraph.graphLocation=/tmp/tinkergraph.kryo
+----
+
+NOTE: The path to TinkerGraph needs to be included in the `HADOOP_GREMLIN_LIBS` for the above
example to work.
+
 [[mapreducegraphcomputer]]
 MapReduceGraphComputer
 ^^^^^^^^^^^^^^^^^^^^^^

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/dd178191/docs/src/the-graphcomputer.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/the-graphcomputer.asciidoc b/docs/src/the-graphcomputer.asciidoc
index 039f7e3..96bcd0f 100644
--- a/docs/src/the-graphcomputer.asciidoc
+++ b/docs/src/the-graphcomputer.asciidoc
@@ -236,119 +236,30 @@ The `PeerPressureVertexProgram` is a clustering algorithm that assigns
a nominal
   .. If there is a tie, then the cluster with the lowest `toString()` comparison is selected.
  . Steps 3 and 4 repeat until either a max number of iterations has occurred or no vertex
has adjusted its cluster anymore.
 
-[[bulkloadingvertexprogram]]
-BulkLoadingVertexProgram
-~~~~~~~~~~~~~~~~~~~~~~~~
+[[bulkloadervertexprogram]]
+BulkLoaderVertexProgram
+~~~~~~~~~~~~~~~~~~~~~~~
 
-The `BulkLoaderVertexProgram` can be used to load graphs of any size (preferably large sized
graphs) into a persistent Graph database. The input can be any existing Graph database supporting
TinkerPop3 or any of the Hadoop GraphInputFormats (e.g. `GraphSONInputFormat`, `GryoInputFormat`
or `ScriptInputFormat`). The following 2 examples show both scenarios in action.
-
-**Load the modern graph from TinkerGraph into Neo4j**
+The `BulkLoaderVertexProgram` provides a generalized way for loading graphs of any size (preferably
large sized graphs) into a persistent `Graph`. The input can be any existing `Graph` database
supporting TinkerPop3 or any of the Hadoop GraphInputFormats (e.g. `GraphSONInputFormat`,
`GryoInputFormat` or `ScriptInputFormat`). The following example demonstrates how to load
data from one TinkerGraph to another:
 
 [gremlin-groovy]
 ----
-wgConf = 'conf/neo4j-standalone.properties'
+writeGraphConf = new BaseConfiguration()
+writeGraphConf.setProperty("gremlin.graph", "org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph")
+writeGraphConf.setProperty("gremlin.tinkergraph.graphFormat", "gryo")
+writeGraphConf.setProperty("gremlin.tinkergraph.graphLocation", "/tmp/tinkergraph.kryo")
 modern = TinkerFactory.createModern()
 blvp = BulkLoaderVertexProgram.build().
            keepOriginalIds(false).
-           writeGraph(wgConf).create(modern)
+           writeGraph(writeGraphConf).create(modern)
 modern.compute().program(blvp).submit().get()
-graph = GraphFactory.open(wgConf)
-g = graph.traversal()
-g.V().valueMap()
-graph.close()
-----
-
-[source,properties]
-----
-# neo4j-standalone.properties
-
-gremlin.graph=org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph
-gremlin.neo4j.directory=/tmp/neo4j
-gremlin.neo4j.conf.node_auto_indexing=true
-gremlin.neo4j.conf.relationship_auto_indexing=true
-----
-
-*Load the Grateful Dead graph from HadoopGraph into TinkerGraph (using Spark)*
-
-[gremlin-groovy]
-----
-hdfs.copyFromLocal('data/grateful-dead.kryo', 'data/grateful-dead.kryo')
-wgConf = 'conf/tinkergraph-gryo.properties'
-grateful = GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties')
-blvp = BulkLoaderVertexProgram.build().
-           keepOriginalIds(false).
-           writeGraph(wgConf).create(grateful)
-grateful.compute(SparkGraphComputer).program(blvp).submit().get()
-:set max-iteration 10
-graph = GraphFactory.open(wgConf)
+graph = GraphFactory.open(writeGraphConf)
 g = graph.traversal()
 g.V().valueMap()
 graph.close()
 ----
 
-*Load the Grateful Dead graph from HadoopGraph into TinkerGraph (using Giraph)*
-
-[gremlin-groovy]
-----
-hdfs.copyFromLocal('data/grateful-dead.kryo', 'data/grateful-dead.kryo')
-wgConf = 'conf/tinkergraph-gryo.properties'
-grateful = GraphFactory.open('conf/hadoop/hadoop-grateful-gryo.properties')
-blvp = BulkLoaderVertexProgram.build().
-           keepOriginalIds(false).
-           writeGraph(wgConf).create(grateful)
-grateful.compute(GiraphGraphComputer).program(blvp).submit().get()
-:set max-iteration 10
-graph = GraphFactory.open(wgConf)
-g = graph.traversal()
-g.V().valueMap()
-graph.close()
-----
-
-[source,properties]
-----
-# hadoop-grateful-gryo.properties
-
-#
-# Hadoop Graph Configuration
-#
-gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
-gremlin.hadoop.graphInputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat
-gremlin.hadoop.graphOutputFormat=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
-gremlin.hadoop.memoryOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
-gremlin.hadoop.inputLocation=data/grateful-dead.kryo
-gremlin.hadoop.outputLocation=output
-gremlin.hadoop.deriveMemory=false
-gremlin.hadoop.jarsInDistributedCache=true
-
-#
-# GiraphGraphComputer Configuration
-#
-giraph.minWorkers=1
-giraph.maxWorkers=1
-giraph.useOutOfCoreGraph=true
-giraph.useOutOfCoreMessages=true
-mapred.map.child.java.opts=-Xmx1024m
-mapred.reduce.child.java.opts=-Xmx1024m
-giraph.numInputThreads=4
-giraph.numComputeThreads=4
-giraph.maxMessagesInMemory=100000
-
-#
-# SparkGraphComputer Configuration
-#
-spark.master=local[1]
-spark.executor.memory=1g
-spark.serializer=org.apache.spark.serializer.KryoSerializer
-----
-
-[source,properties]
-----
-# tinkergraph-gryo.properties
-
-gremlin.graph=org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph
-gremlin.tinkergraph.graphFormat=gryo
-gremlin.tinkergraph.graphLocation=/tmp/tinkergraph.kryo
-----
+Please see the specific Graph <<implementations, implementation>> sections for
information on using the `BulkLoaderVertexProgram` in those contexts.
 
 .Available configuration options
 [width="800px",options="header"]


Mime
View raw message