tinkerpop-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dkupp...@apache.org
Subject [1/9] incubator-tinkerpop git commit: Made work for any persistent TP3-supporting graph (input and output)
Date Thu, 15 Oct 2015 19:44:11 GMT
Repository: incubator-tinkerpop
Updated Branches:
  refs/heads/tp30 a28a1a8f4 -> 25ca5752b


Made  work for any persistent TP3-supporting graph (input and output)


Project: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/commit/3c3abec1
Tree: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/tree/3c3abec1
Diff: http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/diff/3c3abec1

Branch: refs/heads/tp30
Commit: 3c3abec1c9339e7d5fbbf6537d61ac73b7452a04
Parents: 163d937
Author: Daniel Kuppitz <daniel_kuppitz@hotmail.com>
Authored: Mon Oct 12 22:53:04 2015 +0200
Committer: Daniel Kuppitz <daniel_kuppitz@hotmail.com>
Committed: Mon Oct 12 22:53:04 2015 +0200

----------------------------------------------------------------------
 CHANGELOG.asciidoc                              |   1 +
 docs/preprocessor/preprocess-file.sh            |   2 +-
 docs/src/the-graphcomputer.asciidoc             | 124 +++++++++++++++++++
 .../conf/neo4j-standalone.properties            |  22 ++++
 .../conf/tinkergraph-gryo.properties            |  21 ++++
 .../bulkloading/BulkLoaderVertexProgram.java    |   3 -
 tinkergraph-gremlin/pom.xml                     |  18 +++
 7 files changed, 187 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/3c3abec1/CHANGELOG.asciidoc
----------------------------------------------------------------------
diff --git a/CHANGELOG.asciidoc b/CHANGELOG.asciidoc
index 09a12d4..7a59b21 100644
--- a/CHANGELOG.asciidoc
+++ b/CHANGELOG.asciidoc
@@ -25,6 +25,7 @@ image::https://raw.githubusercontent.com/apache/incubator-tinkerpop/master/docs/
 TinkerPop 3.0.2 (NOT OFFICIALLY RELEASED YET)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
+* Made `BulkLoaderVertexProgram` work for any persistent TP3-supporting graph (input and
output).
 * Added a shell script that verifies source and binary distributions.
 * Fixed a bulk related bug in `GroupStep` when used on `GraphComputer` (OLAP).
 * Gremlin Server binary distribution now packages `tinkergraph-gremlin` and `gremlin-groovy`
as plugins to be consistent with Gremlin Console's packaging.

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/3c3abec1/docs/preprocessor/preprocess-file.sh
----------------------------------------------------------------------
diff --git a/docs/preprocessor/preprocess-file.sh b/docs/preprocessor/preprocess-file.sh
index 7a7cfd1..e2ddc28 100755
--- a/docs/preprocessor/preprocess-file.sh
+++ b/docs/preprocessor/preprocess-file.sh
@@ -58,7 +58,7 @@ if [ $(grep -c '^\[gremlin' ${input}) -gt 0 ]; then
 
   awk -f ${AWK_SCRIPTS}/prepare.awk ${input} |
   awk -f ${AWK_SCRIPTS}/init-code-blocks.awk |
-  awk -f ${AWK_SCRIPTS}/progressbar.awk -v tpl=${AWK_SCRIPTS}/progressbar.groovy.template
| HADOOP_GREMLIN_LIBS="${CONSOLE_HOME}/ext/hadoop-gremlin/lib" bin/gremlin.sh |
+  awk -f ${AWK_SCRIPTS}/progressbar.awk -v tpl=${AWK_SCRIPTS}/progressbar.groovy.template
| HADOOP_GREMLIN_LIBS="${CONSOLE_HOME}/ext/hadoop-gremlin/lib:${CONSOLE_HOME}/ext/tinkergraph-gremlin/lib"
bin/gremlin.sh |
   ${lb} awk -f ${AWK_SCRIPTS}/ignore.awk   |
   ${lb} awk -f ${AWK_SCRIPTS}/prettify.awk |
   ${lb} awk -f ${AWK_SCRIPTS}/cleanup.awk  > ${output}

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/3c3abec1/docs/src/the-graphcomputer.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/the-graphcomputer.asciidoc b/docs/src/the-graphcomputer.asciidoc
index 3b65413..75160a5 100644
--- a/docs/src/the-graphcomputer.asciidoc
+++ b/docs/src/the-graphcomputer.asciidoc
@@ -236,6 +236,130 @@ The `PeerPressureVertexProgram` is a clustering algorithm that assigns
a nominal
   .. If there is a tie, then the cluster with the lowest `toString()` comparison is selected.
  . Steps 3 and 4 repeat until either a max number of iterations has occurred or no vertex
has adjusted its cluster anymore.
 
+[[bulkloadingvertexprogram]]
+BulkLoadingVertexProgram
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+The `BulkLoaderVertexProgram` can be used to load graphs of any size (preferably large sized
graphs) into a persistent Graph database. The input can be any existing Graph database supporting
TinkerPop3 or any of the Hadoop GraphInputFormats (e.g. `GraphSONInputFormat`, `GryoInputFormat`
or `ScriptInputFormat`). The following 2 examples show both scenarios in action.
+
+**Load the modern graph from TinkerGraph into Neo4j**
+
+[gremlin-groovy]
+----
+wgConf = 'conf/neo4j-standalone.properties'
+modern = TinkerFactory.createModern()
+blvp = BulkLoaderVertexProgram.build().
+           keepOriginalIds(false).
+           writeGraph(wgConf).create(modern)
+modern.compute().program(blvp).submit().get()
+graph = GraphFactory.open(wgConf)
+g = graph.traversal()
+g.V().valueMap()
+graph.close()
+----
+
+[source,properties]
+----
+# neo4j-standalone.properties
+
+gremlin.graph=org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph
+gremlin.neo4j.directory=/tmp/neo4j
+gremlin.neo4j.conf.node_auto_indexing=true
+gremlin.neo4j.conf.relationship_auto_indexing=true
+----
+
+*Load the Grateful Dead graph from HadoopGraph into TinkerGraph (using Spark)*
+
+[gremlin-groovy]
+----
+wgConf = 'conf/tinkergraph-gryo.properties'
+grateful = GraphFactory.open("conf/hadoop/hadoop-grateful-gryo.properties")
+blvp = BulkLoaderVertexProgram.build().
+           keepOriginalIds(false).
+           writeGraph(wgConf).create(grateful)
+grateful.compute(SparkGraphComputer).program(blvp).submit().get()
+:set max-iteration 10
+graph = GraphFactory.open(wgConf)
+g = graph.traversal()
+g.V().valueMap()
+graph.close()
+----
+
+*Load the Grateful Dead graph from HadoopGraph into TinkerGraph (using Giraph)*
+
+[gremlin-groovy]
+----
+wgConf = 'conf/tinkergraph-gryo.properties'
+grateful = GraphFactory.open("conf/hadoop/hadoop-grateful-gryo.properties")
+blvp = BulkLoaderVertexProgram.build().
+           keepOriginalIds(false).
+           writeGraph(wgConf).create(grateful)
+grateful.compute(GiraphGraphComputer).program(blvp).submit().get()
+:set max-iteration 10
+graph = GraphFactory.open(wgConf)
+g = graph.traversal()
+g.V().valueMap()
+graph.close()
+----
+
+[source,properties]
+----
+# hadoop-grateful-gryo.properties
+
+#
+# Hadoop Graph Configuration
+#
+gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
+gremlin.hadoop.graphInputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat
+gremlin.hadoop.graphOutputFormat=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
+gremlin.hadoop.memoryOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
+gremlin.hadoop.inputLocation=data/grateful-dead.kryo
+gremlin.hadoop.outputLocation=output
+gremlin.hadoop.deriveMemory=false
+gremlin.hadoop.jarsInDistributedCache=true
+
+#
+# GiraphGraphComputer Configuration
+#
+giraph.minWorkers=1
+giraph.maxWorkers=1
+giraph.useOutOfCoreGraph=true
+giraph.useOutOfCoreMessages=true
+mapred.map.child.java.opts=-Xmx1024m
+mapred.reduce.child.java.opts=-Xmx1024m
+giraph.numInputThreads=4
+giraph.numComputeThreads=4
+giraph.maxMessagesInMemory=100000
+
+#
+# SparkGraphComputer Configuration
+#
+spark.master=local[1]
+spark.executor.memory=1g
+spark.serializer=org.apache.spark.serializer.KryoSerializer
+----
+
+[source,properties]
+----
+# tinkergraph-gryo.properties
+
+gremlin.graph=org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph
+gremlin.tinkergraph.graphFormat=gryo
+gremlin.tinkergraph.graphLocation=/tmp/tinkergraph.kryo
+----
+
+.Available configuration options
+[width="800px",options="header"]
+|========================================
+|Builder Method    |Purpose | Default Value
+|`bulkLoader(Class\|String)` | Sets the class of the bulk loader implementation. | `IncrementalBulkLoader`
+|`vertexIdProperty(String)` | Sets the name of the property in the target graph that holds
the vertex id from the source graph. | `bulkLoader.vertex.id`
+|`keepOriginalIds(boolean)` |Whether to keep the id's from the source graph in the target
graph or not. It's recommended to keep them if it's planned to do further bulk loads using
the same datasources. | `true`
+|`userSuppliedIds(boolean)` |Whether to use the id's from the source graph as id's in the
target graph. If set to `true`, `vertexIdProperty` will be ignored. Note, that the target
graph must support user supplied identifiers. | `false`
+|`intermediateBatchSize(int)` |Sets the batch size for intermediate transactions. This is
per thread in a multi-threaded environment. +0+ means that transactions will only be committed
at the end of an iteration cycle. It's recommended to tune this property for the target graph
and not use the default value of +0+. | `0`
+|`writeGraph(String)` | Sets the path to a `GraphFactory` compatible configuration file for
the target graph. | _none_
+|========================================
+
 [[traversalvertexprogram]]
 TraversalVertexProgram
 ~~~~~~~~~~~~~~~~~~~~~~

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/3c3abec1/gremlin-console/conf/neo4j-standalone.properties
----------------------------------------------------------------------
diff --git a/gremlin-console/conf/neo4j-standalone.properties b/gremlin-console/conf/neo4j-standalone.properties
new file mode 100644
index 0000000..0822711
--- /dev/null
+++ b/gremlin-console/conf/neo4j-standalone.properties
@@ -0,0 +1,22 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+gremlin.graph=org.apache.tinkerpop.gremlin.neo4j.structure.Neo4jGraph
+
+gremlin.neo4j.directory=/tmp/neo4j
+gremlin.neo4j.conf.node_auto_indexing=true
+gremlin.neo4j.conf.relationship_auto_indexing=true

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/3c3abec1/gremlin-console/conf/tinkergraph-gryo.properties
----------------------------------------------------------------------
diff --git a/gremlin-console/conf/tinkergraph-gryo.properties b/gremlin-console/conf/tinkergraph-gryo.properties
new file mode 100644
index 0000000..5a226f5
--- /dev/null
+++ b/gremlin-console/conf/tinkergraph-gryo.properties
@@ -0,0 +1,21 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+gremlin.graph=org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph
+
+gremlin.tinkergraph.graphFormat=gryo
+gremlin.tinkergraph.graphLocation=/tmp/tinkergraph.kryo

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/3c3abec1/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/bulkloading/BulkLoaderVertexProgram.java
----------------------------------------------------------------------
diff --git a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/bulkloading/BulkLoaderVertexProgram.java
b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/bulkloading/BulkLoaderVertexProgram.java
index d5eb835..7e1e677 100644
--- a/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/bulkloading/BulkLoaderVertexProgram.java
+++ b/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/bulkloading/BulkLoaderVertexProgram.java
@@ -172,9 +172,6 @@ public class BulkLoaderVertexProgram implements VertexProgram<Tuple>
{
             graph = GraphFactory.open(configuration.subset(WRITE_GRAPH_CFG_KEY));
             LOGGER.info("Opened Graph instance: {}", graph);
             try {
-                if (!graph.features().graph().supportsConcurrentAccess()) {
-                    throw new IllegalStateException("The given graph instance does not allow
concurrent access.");
-                }
                 g = graph.traversal();
             } catch (Exception e) {
                 try {

http://git-wip-us.apache.org/repos/asf/incubator-tinkerpop/blob/3c3abec1/tinkergraph-gremlin/pom.xml
----------------------------------------------------------------------
diff --git a/tinkergraph-gremlin/pom.xml b/tinkergraph-gremlin/pom.xml
index e0a002c..7e68516 100644
--- a/tinkergraph-gremlin/pom.xml
+++ b/tinkergraph-gremlin/pom.xml
@@ -45,6 +45,24 @@ limitations under the License.
             <version>${project.version}</version>
             <scope>test</scope>
         </dependency>
+        <dependency>
+            <groupId>org.apache.tinkerpop</groupId>
+            <artifactId>neo4j-gremlin</artifactId>
+            <version>${project.version}</version>
+            <scope>test</scope>
+        </dependency>
+        <dependency>
+            <groupId>org.neo4j</groupId>
+            <artifactId>neo4j-tinkerpop-api</artifactId>
+            <version>0.1</version>
+            <scope>test</scope>
+        </dependency>
+        <dependency>
+            <groupId>org.neo4j</groupId>
+            <artifactId>neo4j-tinkerpop-api-impl</artifactId>
+            <version>0.1-2.2</version>
+            <scope>test</scope>
+        </dependency>
     </dependencies>
     <build>
         <directory>${basedir}/target</directory>


Mime
View raw message