spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From saru...@apache.org
Subject spark git commit: [SPARK-18432][DOC] Changed HDFS default block size from 64MB to 128MB
Date Mon, 14 Nov 2016 12:08:50 GMT
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 26ae5cfa7 -> 666396510


[SPARK-18432][DOC] Changed HDFS default block size from 64MB to 128MB

Changed HDFS default block size from 64MB to 128MB.
https://issues.apache.org/jira/browse/SPARK-18432

Author: Noritaka Sekiyama <moomindani@gmail.com>

Closes #15879 from moomindani/SPARK-18432.

(cherry picked from commit 9d07ceee7860921eafb55b47852f1b51089c98da)
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/66639651
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/66639651
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/66639651

Branch: refs/heads/branch-2.0
Commit: 6663965108cff4095cc73e48c2cfb80ac25316f2
Parents: 26ae5cf
Author: Noritaka Sekiyama <moomindani@gmail.com>
Authored: Mon Nov 14 21:07:59 2016 +0900
Committer: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Committed: Mon Nov 14 21:08:29 2016 +0900

----------------------------------------------------------------------
 docs/programming-guide.md | 6 +++---
 docs/tuning.md            | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/66639651/docs/programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index 204ad5e..6cfb8b4 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -339,7 +339,7 @@ Some notes on reading files with Spark:
 
 * All of Spark's file-based input methods, including `textFile`, support running on directories,
compressed files, and wildcards as well. For example, you can use `textFile("/my/directory")`,
`textFile("/my/directory/*.txt")`, and `textFile("/my/directory/*.gz")`.
 
-* The `textFile` method also takes an optional second argument for controlling the number
of partitions of the file. By default, Spark creates one partition for each block of the file
(blocks being 64MB by default in HDFS), but you can also ask for a higher number of partitions
by passing a larger value. Note that you cannot have fewer partitions than blocks.
+* The `textFile` method also takes an optional second argument for controlling the number
of partitions of the file. By default, Spark creates one partition for each block of the file
(blocks being 128MB by default in HDFS), but you can also ask for a higher number of partitions
by passing a larger value. Note that you cannot have fewer partitions than blocks.
 
 Apart from text files, Spark's Scala API also supports several other data formats:
 
@@ -371,7 +371,7 @@ Some notes on reading files with Spark:
 
 * All of Spark's file-based input methods, including `textFile`, support running on directories,
compressed files, and wildcards as well. For example, you can use `textFile("/my/directory")`,
`textFile("/my/directory/*.txt")`, and `textFile("/my/directory/*.gz")`.
 
-* The `textFile` method also takes an optional second argument for controlling the number
of partitions of the file. By default, Spark creates one partition for each block of the file
(blocks being 64MB by default in HDFS), but you can also ask for a higher number of partitions
by passing a larger value. Note that you cannot have fewer partitions than blocks.
+* The `textFile` method also takes an optional second argument for controlling the number
of partitions of the file. By default, Spark creates one partition for each block of the file
(blocks being 128MB by default in HDFS), but you can also ask for a higher number of partitions
by passing a larger value. Note that you cannot have fewer partitions than blocks.
 
 Apart from text files, Spark's Java API also supports several other data formats:
 
@@ -403,7 +403,7 @@ Some notes on reading files with Spark:
 
 * All of Spark's file-based input methods, including `textFile`, support running on directories,
compressed files, and wildcards as well. For example, you can use `textFile("/my/directory")`,
`textFile("/my/directory/*.txt")`, and `textFile("/my/directory/*.gz")`.
 
-* The `textFile` method also takes an optional second argument for controlling the number
of partitions of the file. By default, Spark creates one partition for each block of the file
(blocks being 64MB by default in HDFS), but you can also ask for a higher number of partitions
by passing a larger value. Note that you cannot have fewer partitions than blocks.
+* The `textFile` method also takes an optional second argument for controlling the number
of partitions of the file. By default, Spark creates one partition for each block of the file
(blocks being 128MB by default in HDFS), but you can also ask for a higher number of partitions
by passing a larger value. Note that you cannot have fewer partitions than blocks.
 
 Apart from text files, Spark's Python API also supports several other data formats:
 

http://git-wip-us.apache.org/repos/asf/spark/blob/66639651/docs/tuning.md
----------------------------------------------------------------------
diff --git a/docs/tuning.md b/docs/tuning.md
index 9c43b31..0de303a 100644
--- a/docs/tuning.md
+++ b/docs/tuning.md
@@ -224,8 +224,8 @@ temporary objects created during task execution. Some steps which may
be useful
 
 * As an example, if your task is reading data from HDFS, the amount of memory used by the
task can be estimated using
   the size of the data block read from HDFS. Note that the size of a decompressed block is
often 2 or 3 times the
-  size of the block. So if we wish to have 3 or 4 tasks' worth of working space, and the
HDFS block size is 64 MB,
-  we can estimate size of Eden to be `4*3*64MB`.
+  size of the block. So if we wish to have 3 or 4 tasks' worth of working space, and the
HDFS block size is 128 MB,
+  we can estimate size of Eden to be `4*3*128MB`.
 
 * Monitor how the frequency and time taken by garbage collection changes with the new settings.
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message