spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: Tungsten in a mixed endian environment
Date Tue, 12 Jan 2016 22:27:54 GMT

On 12 Jan 2016, at 10:49, Reynold Xin <rxin@databricks.com<mailto:rxin@databricks.com>>
wrote:

How big of a deal this use case is in a heterogeneous endianness environment? If we do want
to fix it, we should do it when right before Spark shuffles data to minimize performance penalty,
i.e. turn big-endian encoded data into little-indian encoded data before it goes on the wire.
This is a pretty involved change and given other things that might break across heterogeneous
endianness environments, I am not sure if it is high priority enough to even warrant review
bandwidth right now.



This is a classic problem in distributed computing, which has two common strategies


the SunOS RPC strategy: fixed order. For Sun, hence NFS, the order was that of the Motorola
68K, so cost-free on Sun workstations. As SPARC used the same byte ordering; again, free.
For x86 parts wanting to play, inefficient at both sending and receiving. Protobuf has a fixed
order, but here little-endian https://developers.google.com/protocol-buffers/docs/encoding.

Apollo RPC DCE strategy: packets declare byte order, recipient gets to deal with it. This
is efficient in a homogenous cluster of either endianness, as x86-x86 would be zero-byteswapping.
The Apollo design ended up in DCE, which is what Windows Distributed COM uses.  ( http://pubs.opengroup.org/onlinepubs/9629399/chap14.htm
). If you look at that spec, you can see its floating point marshalling that's most trouble.

recipient-makes-good is ideal for clusters where the systems all share the same endianness:
the amount of marshalling is guaranteed to be zero if all CPU parts are the same. That's clearly
the defacto strategy in Spark. On contrast, the one-network-fomat is guaranteed to have 0
byteswaps on CPUs whose endian matches the wire format, guaranteed to be two for the other
part (one at each end). For mixed-endian RPC there'll be one bswap, so the cost is the same
as for the apollo DCE.

Bits of hadoop core do byteswap stuff; for performance this is in native code; code which
has to use assembly and builtin functions for max efficiency.

It's a big patch —one that's designed for effective big-endian support, *ignoring heterogenous
clusters*

https://issues.apache.org/jira/secure/attachment/12776247/HADOOP-11505.007.patch

All that stuff cropped up during Alan Burlinson sitting down to get Hadoop working properly
on Sparc —that's a big enough project on its own that worrying about heterogenous systems
isn't on his roadmap —and nobody else appears to care.

I'd suggest the same to IBM: focus effort & testing on Power + AIX rather than worrying
about heterogenous systems.

-Steve
Mime
View raw message