spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: Tungsten in a mixed endian environment
Date Tue, 12 Jan 2016 21:13:07 GMT

On 12 Jan 2016, at 10:49, Reynold Xin <rxin@databricks.com<mailto:rxin@databricks.com>>
wrote:

How big of a deal this use case is in a heterogeneous endianness environment? If we do want
to fix it, we should do it when right before Spark shuffles data to minimize performance penalty,
i.e. turn big-endian encoded data into little-indian encoded data before it goes on the wire.
This is a pretty involved change and given other things that might break across heterogeneous
endianness environments, I am not sure if it is high priority enough to even warrant review
bandwidth right now.




It's notable that Hadoop doesn't like mixed-endianness; there is work (primarily from Oracle)
to have consistent byteswapping —that is: work reliably on big-endian systems  https://issues.apache.org/jira/browse/HADOOP-11505
). There's no motivation to support mixed-endian clusters.


The majority of clusters x86, there's only 3 cpu families that are little endian: Spark, Power,
Arm. Adam has clearly been playing with Power + x86, but I'd suspect that's experimentation,
not production.

What is probably worth checking is mixed endian-ness between client apps submitting work and
the servers: Java and Kryo serialization should handle that automatically.
Mime
View raw message