hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Burlison (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-6417) MapReduceClient's primitives.h is toxic and should be extirpated
Date Fri, 26 Jun 2015 11:21:05 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alan Burlison updated MAPREDUCE-6417:
-------------------------------------
    Description: 
MapReduceClient's primitives.h attempts to provide optimised versions of standard library
memory copy and comparison functions. It has been the subject of several portability-related
bugs:

* HADOOP-11505 hadoop-mapreduce-client-nativetask uses bswap where be32toh is needed, doesn't
work on non-x86
* HADOOP-11665 Provide and unify cross platform byteorder support in native code
* MAPREDUCE-6397 MAPREDUCE makes many endian-dependent assumptions
* HADOOP-11484 hadoop-mapreduce-client-nativetask fails to build on ARM AARCH64 due to x86
asm statements

At present it only works on x86 and ARM64 as it lacks definitions for bswap and bswap64 for
any platforms other than those.

However it has even more serious problems on non-866 architectures, for example on SPARC simple_memcpy
simply doesn't work at all:

{code}
$ cat bang.cc
#include <string.h>
#define SIMPLE_MEMCPY
#include "primitives.h"
int main(int argc, char **argv)
{
    char b1[9];
    char b2[9];
    simple_memcpy(b2, b1, sizeof(b1));
}
$ gcc -o bang bang.cc && ./bang
Bus Error (core dumped)
{code}
That's because simple_memcpy does pointer fiddling that results in misaligned accesses, which
are illegal on SPARC.

fmemcmp is also broken. Even if a definition of bswap is provided, on big-endian architectures
the result is simply wrong because of its unconditional use of bswap:

{code}
$ cat thud.cc
#include <stdio.h>
#include <strings.h>
#include "primitives.h"
int main(int argc, char **argv)
{
    char a[] = { 0,1,2,0 };
    char b[] = { 0,2,1,0 };
    printf("%lld %d\n", fmemcmp(a, b, sizeof(a), memcmp(a, b, sizeof(a))));
}
$ g++ -o thud thud.cc && ./thud
65280 -1
{code}

And in addition fmemcmp suffers from the same misalignment issues as simple_memcpy and coredumps
on SPARC when asked to compare odd-sized buffers.

primitives.h provides the following functions:

* bswap() - used in 12 files in MRC but as HADOOP-11505 points out, mostly incorrectly as
it takes no account of platform endianness
* bswap64() - used in 4 files in MRC, same comments as per bswap apply
* simple_memcpy() - used in 3 files in MRC, should be replaced with the standard memcpy()

* fmemcmp() - used in 1 file, should be replaced with the standard memcmp()
* fmemeq() - used in 1 file, should be replaced with the standard memcmp()
* frmemeq() - not used at all, should just be removed

Summary: primitives.h should simply be deleted and replaced with the standard memory copy
& compare functions, or with thin wrappers around them where the APIs are different.

  was:
MapReduceClient's primitives.h attempts to provide optimised versions of standard library
memory copy and comparison functions. It has been the subject of several portability-related
bugs:

* HADOOP-11505 hadoop-mapreduce-client-nativetask uses bswap where be32toh is needed, doesn't
work on non-x86
* HADOOP-11665 Provide and unify cross platform byteorder support in native code
* MAPREDUCE-6397 MAPREDUCE makes many endian-dependent assumptions
* HADOOP-11484 hadoop-mapreduce-client-nativetask fails to build on ARM AARCH64 due to x86
asm statements

At present it only works on x86 and ARM64 as it lacks definitions for bswap and bswap64 for
any platforms other than those.

However it has even more serious problems on non-866 architectures, for example on SPARC simple_memcpy
simply doesn't work at all:

{code}
$ cat bang.cc
#include <string.h>
#define SIMPLE_MEMCPY
#include "primitives.h"
int main(int argc, char **argv)
{
    char b1[9];
    char b2[9];
    simple_memcpy(b2, b1, sizeof(b1));
}
$ gcc -o bang bang.cc && ./bang
Bus Error (core dumped)
{code}
That's because simple_memcpy does pointer fiddling that results in misaligned accesses, which
are illegal on SPARC.

fmemcmp is also broken. Even if a definition of bswap is provided, on big-endian architectures
the result is simply wrong because of its unconditional use of bswap:

{code}
$ cat thud.cc
#include <stdio.h>
#include <strings.h>
#include "primitives.h"
int main(int argc, char **argv)
{
    char a[] = { 0,1,2,0 };
    char b[] = { 0,2,1,0 };
    printf("%lld %d\n", fmemcmp(a, b, sizeof(a), memcmp(a, b, sizeof(a))));
}
$ g++ -o thud thud.cc && ./thud
65280 -1
{code}

And in addition fmemcmp suffers from the same misalignment issues as simple_memcpy and coredumps
on SPARC when asked to compare odd-sized buffers.

primitives.h should simply be deleted and replaced with the standard memory copy & compare
functions, or with thin wrappers around them where the APIs are different.


> MapReduceClient's primitives.h is toxic and should be extirpated
> ----------------------------------------------------------------
>
>                 Key: MAPREDUCE-6417
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6417
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: client
>    Affects Versions: 2.7.0
>            Reporter: Alan Burlison
>            Priority: Blocker
>
> MapReduceClient's primitives.h attempts to provide optimised versions of standard library
memory copy and comparison functions. It has been the subject of several portability-related
bugs:
> * HADOOP-11505 hadoop-mapreduce-client-nativetask uses bswap where be32toh is needed,
doesn't work on non-x86
> * HADOOP-11665 Provide and unify cross platform byteorder support in native code
> * MAPREDUCE-6397 MAPREDUCE makes many endian-dependent assumptions
> * HADOOP-11484 hadoop-mapreduce-client-nativetask fails to build on ARM AARCH64 due to
x86 asm statements
> At present it only works on x86 and ARM64 as it lacks definitions for bswap and bswap64
for any platforms other than those.
> However it has even more serious problems on non-866 architectures, for example on SPARC
simple_memcpy simply doesn't work at all:
> {code}
> $ cat bang.cc
> #include <string.h>
> #define SIMPLE_MEMCPY
> #include "primitives.h"
> int main(int argc, char **argv)
> {
>     char b1[9];
>     char b2[9];
>     simple_memcpy(b2, b1, sizeof(b1));
> }
> $ gcc -o bang bang.cc && ./bang
> Bus Error (core dumped)
> {code}
> That's because simple_memcpy does pointer fiddling that results in misaligned accesses,
which are illegal on SPARC.
> fmemcmp is also broken. Even if a definition of bswap is provided, on big-endian architectures
the result is simply wrong because of its unconditional use of bswap:
> {code}
> $ cat thud.cc
> #include <stdio.h>
> #include <strings.h>
> #include "primitives.h"
> int main(int argc, char **argv)
> {
>     char a[] = { 0,1,2,0 };
>     char b[] = { 0,2,1,0 };
>     printf("%lld %d\n", fmemcmp(a, b, sizeof(a), memcmp(a, b, sizeof(a))));
> }
> $ g++ -o thud thud.cc && ./thud
> 65280 -1
> {code}
> And in addition fmemcmp suffers from the same misalignment issues as simple_memcpy and
coredumps on SPARC when asked to compare odd-sized buffers.
> primitives.h provides the following functions:
> * bswap() - used in 12 files in MRC but as HADOOP-11505 points out, mostly incorrectly
as it takes no account of platform endianness
> * bswap64() - used in 4 files in MRC, same comments as per bswap apply
> * simple_memcpy() - used in 3 files in MRC, should be replaced with the standard memcpy()

> * fmemcmp() - used in 1 file, should be replaced with the standard memcmp()
> * fmemeq() - used in 1 file, should be replaced with the standard memcmp()
> * frmemeq() - not used at all, should just be removed
> Summary: primitives.h should simply be deleted and replaced with the standard memory
copy & compare functions, or with thin wrappers around them where the APIs are different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message