spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Tungsten in a mixed endian environment
Date Tue, 12 Jan 2016 15:30:17 GMT
I logged SPARK-12778 where endian awareness in Platform.java should
help in mixed
endian set up.

There could be other parts of the code base which are related.

Cheers

On Tue, Jan 12, 2016 at 7:01 AM, Adam Roberts <AROBERTS@uk.ibm.com> wrote:

> Hi all, I've been experimenting with DataFrame operations in a mixed
> endian environment - a big endian master with little endian workers. With
> tungsten enabled I'm encountering data corruption issues.
>
> For example, with this simple test code:
>
> import org.apache.spark.SparkContext
> import org.apache.spark._
> import org.apache.spark.sql.SQLContext
>
> object SimpleSQL {
>   def main(args: Array[String]): Unit = {
>     if (args.length != 1) {
>       println("Not enough args, you need to specify the master url")
>     }
>     val masterURL = args(0)
>     println("Setting up Spark context at: " + masterURL)
>     val sparkConf = new SparkConf
>     val sc = new SparkContext(masterURL, "Unsafe endian test", sparkConf)
>
>     println("Performing SQL tests")
>
>     val sqlContext = new SQLContext(sc)
>     println("SQL context set up")
>     val df = sqlContext.read.json("/tmp/people.json")
>     df.show()
>     println("Selecting everyone's age and adding one to it")
>     df.select(df("name"), df("age") + 1).show()
>     println("Showing all people over the age of 21")
>     df.filter(df("age") > 21).show()
>     println("Counting people by age")
>     df.groupBy("age").count().show()
>   }
> }
>
> Instead of getting
>
> +----+-----+
> | age|count|
> +----+-----+
> |null|    1|
> |  19|    1|
> |  30|    1|
> +----+-----+
>
> I get the following with my mixed endian set up:
>
> +-------------------+-----------------+
> |                age|            count|
> +-------------------+-----------------+
> |               null|                1|
> |1369094286720630784|72057594037927936|
> |                 30|                1|
> +-------------------+-----------------+
>
> and on another run:
>
> +-------------------+-----------------+
> |                age|            count|
> +-------------------+-----------------+
> |                  0|72057594037927936|
> |                 19|                1|
>
> Is Spark expected to work in such an environment? If I turn off tungsten
> (sparkConf.set("spark.sql.tungsten.enabled", "false"), in 20 runs I don't
> see any problems.
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>

Mime
View raw message