spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From r...@apache.org
Subject git commit: Merge pull request #562 from jyotiska/master. Closes #562.
Date Sun, 09 Feb 2014 07:36:51 GMT
Updated Branches:
  refs/heads/master b6d40b782 -> 2ef37c936


Merge pull request #562 from jyotiska/master. Closes #562.

Added example Python code for sort

I added an example Python code for sort. Right now, PySpark has limited examples for new people
willing to use the project. This example code sorts integers stored in a file. I was able
to sort 5 million, 10 million and 25 million integers with this code.

Author: jyotiska <jyotiska123@gmail.com>

== Merge branch commits ==

commit 8ad8faf6c8e02ae1cd68565d98524edf165f54df
Author: jyotiska <jyotiska123@gmail.com>
Date:   Sun Feb 9 11:00:41 2014 +0530

    Added comments in code on collect() method

commit 6f98f1e313f4472a7c2207d36c4f0fbcebc95a8c
Author: jyotiska <jyotiska123@gmail.com>
Date:   Sat Feb 8 13:12:37 2014 +0530

    Updated python example code sort.py

commit 945e39a5d68daa7e5bab0d96cbd35d7c4b04eafb
Author: jyotiska <jyotiska123@gmail.com>
Date:   Sat Feb 8 12:59:09 2014 +0530

    Added example python code for sort


Project: http://git-wip-us.apache.org/repos/asf/incubator-spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-spark/commit/2ef37c93
Tree: http://git-wip-us.apache.org/repos/asf/incubator-spark/tree/2ef37c93
Diff: http://git-wip-us.apache.org/repos/asf/incubator-spark/diff/2ef37c93

Branch: refs/heads/master
Commit: 2ef37c93664d74de6d7f6144834883a4a4ef79b7
Parents: b6d40b7
Author: jyotiska <jyotiska123@gmail.com>
Authored: Sat Feb 8 23:36:48 2014 -0800
Committer: Reynold Xin <rxin@apache.org>
Committed: Sat Feb 8 23:36:48 2014 -0800

----------------------------------------------------------------------
 python/examples/sort.py | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/2ef37c93/python/examples/sort.py
----------------------------------------------------------------------
diff --git a/python/examples/sort.py b/python/examples/sort.py
new file mode 100755
index 0000000..5de20a6
--- /dev/null
+++ b/python/examples/sort.py
@@ -0,0 +1,36 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import sys
+
+from pyspark import SparkContext
+
+
+if __name__ == "__main__":
+    if len(sys.argv) < 3:
+        print >> sys.stderr, "Usage: sort <master> <file>"
+        exit(-1)
+    sc = SparkContext(sys.argv[1], "PythonSort")
+    lines = sc.textFile(sys.argv[2], 1)
+    sortedCount = lines.flatMap(lambda x: x.split(' ')) \
+                  .map(lambda x: (int(x), 1)) \
+                  .sortByKey(lambda x: x)
+    # This is just a demo on how to bring all the sorted data back to a single node.
+    # In reality, we wouldn't want to collect all the data to the driver node.
+    output = sortedCount.collect()
+    for (num, unitcount) in output:
+        print num


Mime
View raw message