spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sro...@apache.org
Subject spark git commit: [GRAPHX][EXAMPLES] move graphx test data directory and update graphx document
Date Sat, 02 Jul 2016 07:40:27 GMT
Repository: spark
Updated Branches:
  refs/heads/master bad0f7dbb -> 192d1f9cf


[GRAPHX][EXAMPLES] move graphx test data directory and update graphx document

## What changes were proposed in this pull request?

There are two test data files used for graphx examples existing in directory "graphx/data"
I move it into "data/" directory because the "graphx" directory is used for code files and
other test data files (such as mllib, streaming test data) are all in there.

I also update the graphx document where reference the data files which I move place.

## How was this patch tested?

N/A

Author: WeichenXu <WeichenXu123@outlook.com>

Closes #14010 from WeichenXu123/move_graphx_data_dir.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/192d1f9c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/192d1f9c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/192d1f9c

Branch: refs/heads/master
Commit: 192d1f9cf3463d050b87422939448f2acf86acc9
Parents: bad0f7d
Author: WeichenXu <WeichenXu123@outlook.com>
Authored: Sat Jul 2 08:40:23 2016 +0100
Committer: Sean Owen <sowen@cloudera.com>
Committed: Sat Jul 2 08:40:23 2016 +0100

----------------------------------------------------------------------
 data/graphx/followers.txt        |  8 ++++++++
 data/graphx/users.txt            |  7 +++++++
 docs/graphx-programming-guide.md | 18 +++++++++---------
 graphx/data/followers.txt        |  8 --------
 graphx/data/users.txt            |  7 -------
 5 files changed, 24 insertions(+), 24 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/192d1f9c/data/graphx/followers.txt
----------------------------------------------------------------------
diff --git a/data/graphx/followers.txt b/data/graphx/followers.txt
new file mode 100644
index 0000000..7bb8e90
--- /dev/null
+++ b/data/graphx/followers.txt
@@ -0,0 +1,8 @@
+2 1
+4 1
+1 2
+6 3
+7 3
+7 6
+6 7
+3 7

http://git-wip-us.apache.org/repos/asf/spark/blob/192d1f9c/data/graphx/users.txt
----------------------------------------------------------------------
diff --git a/data/graphx/users.txt b/data/graphx/users.txt
new file mode 100644
index 0000000..982d19d
--- /dev/null
+++ b/data/graphx/users.txt
@@ -0,0 +1,7 @@
+1,BarackObama,Barack Obama
+2,ladygaga,Goddess of Love
+3,jeresig,John Resig
+4,justinbieber,Justin Bieber
+6,matei_zaharia,Matei Zaharia
+7,odersky,Martin Odersky
+8,anonsys

http://git-wip-us.apache.org/repos/asf/spark/blob/192d1f9c/docs/graphx-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md
index 81cf174..e376b66 100644
--- a/docs/graphx-programming-guide.md
+++ b/docs/graphx-programming-guide.md
@@ -1007,15 +1007,15 @@ PageRank measures the importance of each vertex in a graph, assuming
an edge fro
 
 GraphX comes with static and dynamic implementations of PageRank as methods on the [`PageRank`
object][PageRank]. Static PageRank runs for a fixed number of iterations, while dynamic PageRank
runs until the ranks converge (i.e., stop changing by more than a specified tolerance). [`GraphOps`][GraphOps]
allows calling these algorithms directly as methods on `Graph`.
 
-GraphX also includes an example social network dataset that we can run PageRank on. A set
of users is given in `graphx/data/users.txt`, and a set of relationships between users is
given in `graphx/data/followers.txt`. We compute the PageRank of each user as follows:
+GraphX also includes an example social network dataset that we can run PageRank on. A set
of users is given in `data/graphx/users.txt`, and a set of relationships between users is
given in `data/graphx/followers.txt`. We compute the PageRank of each user as follows:
 
 {% highlight scala %}
 // Load the edges as a graph
-val graph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt")
+val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
 // Run PageRank
 val ranks = graph.pageRank(0.0001).vertices
 // Join the ranks with the usernames
-val users = sc.textFile("graphx/data/users.txt").map { line =>
+val users = sc.textFile("data/graphx/users.txt").map { line =>
   val fields = line.split(",")
   (fields(0).toLong, fields(1))
 }
@@ -1032,11 +1032,11 @@ The connected components algorithm labels each connected component
of the graph
 
 {% highlight scala %}
 // Load the graph as in the PageRank example
-val graph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt")
+val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
 // Find the connected components
 val cc = graph.connectedComponents().vertices
 // Join the connected components with the usernames
-val users = sc.textFile("graphx/data/users.txt").map { line =>
+val users = sc.textFile("data/graphx/users.txt").map { line =>
   val fields = line.split(",")
   (fields(0).toLong, fields(1))
 }
@@ -1053,11 +1053,11 @@ A vertex is part of a triangle when it has two adjacent vertices with
an edge be
 
 {% highlight scala %}
 // Load the edges in canonical order and partition the graph for triangle count
-val graph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt", true).partitionBy(PartitionStrategy.RandomVertexCut)
+val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt", true).partitionBy(PartitionStrategy.RandomVertexCut)
 // Find the triangle count for each vertex
 val triCounts = graph.triangleCount().vertices
 // Join the triangle counts with the usernames
-val users = sc.textFile("graphx/data/users.txt").map { line =>
+val users = sc.textFile("data/graphx/users.txt").map { line =>
   val fields = line.split(",")
   (fields(0).toLong, fields(1))
 }
@@ -1081,11 +1081,11 @@ all of this in just a few lines with GraphX:
 val sc = new SparkContext("spark://master.amplab.org", "research")
 
 // Load my user data and parse into tuples of user id and attribute list
-val users = (sc.textFile("graphx/data/users.txt")
+val users = (sc.textFile("data/graphx/users.txt")
   .map(line => line.split(",")).map( parts => (parts.head.toLong, parts.tail) ))
 
 // Parse the edge data which is already in userId -> userId format
-val followerGraph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt")
+val followerGraph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
 
 // Attach the user attributes
 val graph = followerGraph.outerJoinVertices(users) {

http://git-wip-us.apache.org/repos/asf/spark/blob/192d1f9c/graphx/data/followers.txt
----------------------------------------------------------------------
diff --git a/graphx/data/followers.txt b/graphx/data/followers.txt
deleted file mode 100644
index 7bb8e90..0000000
--- a/graphx/data/followers.txt
+++ /dev/null
@@ -1,8 +0,0 @@
-2 1
-4 1
-1 2
-6 3
-7 3
-7 6
-6 7
-3 7

http://git-wip-us.apache.org/repos/asf/spark/blob/192d1f9c/graphx/data/users.txt
----------------------------------------------------------------------
diff --git a/graphx/data/users.txt b/graphx/data/users.txt
deleted file mode 100644
index 982d19d..0000000
--- a/graphx/data/users.txt
+++ /dev/null
@@ -1,7 +0,0 @@
-1,BarackObama,Barack Obama
-2,ladygaga,Goddess of Love
-3,jeresig,John Resig
-4,justinbieber,Justin Bieber
-6,matei_zaharia,Matei Zaharia
-7,odersky,Martin Odersky
-8,anonsys


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message