impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Youwei Wang (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-2809: Improve ByteSwap with builtin function or SSE or AVX2.
Date Mon, 23 May 2016 03:13:02 GMT
Youwei Wang has posted comments on this change.

Change subject: IMPALA-2809: Improve ByteSwap with builtin function or SSE or AVX2.
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3081/3/be/src/util/bit-util.inline.h
File be/src/util/bit-util.inline.h:

Line 140:   const __m128i mask = _mm_set_epi8(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
14,
> Hi Jim. Thank you for providing this pseudocode. Actually, the macro "	#ifn
Hi Jim. I have conducted some simple tests. 
In order to describe it simply, I define several test items here:
1. ScalarFunc: call the function ByteSwapScalar(void* dest, const void* source, int len);

2. SSE4.2 OUTSIDE PERF: put the variable "const __m128i mask" outside the function;
3. SSE4.2[INSIDE-STATIC]: put the variable "const __m128i mask"  inside the function WITH
static modifier;
4. SSE4.2[INSIDE-NOT-STATIC]: put the variable "const __m128i mask"  inside the function WITHOUT
static modifier;
5. AVX2[INSIDE-STATIC]: put the variable "const __m256i mask"  inside the function WITH static
modifier;
6. AVX2[INSIDE-NOT-STATIC]: put the variable "const __m256i mask"  inside the function WITHOUT
static modifier;
Note: GCC has not good support for AVX2 enough, so putting the variable "const __m256i mask"
outside the function can't compile.

Test approach:
1. Prepare an uint8_t array of 10000000 elements, whose values are randomly generated;
2. Use those 6 approaches to swap this array for 1000 times and measure the consumed time;
3. SSE4.2 call: ByteSwapSIMD<16>;
4. AVX2 call: ByteSwapSIMD<32>;

CPU info: Intel(R) Core(TM) i5-4460  CPU @ 3.20GHz

So the performance result is:
SCALAR PERF: 1x
SSE4.2[OUTSIDE PERF]: 3.00x
SSE4.2[INSIDE-STATIC] PERF: 2.75x 
SSE4.2[INSIDE-NOT-STATIC] PERF: 2.89x 

AVX2[INSIDE-STATIC] PERF: 2.90x 
AVX2[INSIDE-NOT-STATIC] PERF: 3.27x 

Conclusion: so for SSE4.2, we should put the const __m128i mask initializer code outside.

For AVX2, we should not use the static modifier.


-- 
To view, visit http://gerrit.cloudera.org:8080/3081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I392ed5a8d5683f30f161282c228c1aedd7b648c1
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Youwei Wang <429222616@qq.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbapple@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Youwei Wang <429222616@qq.com>
Gerrit-HasComments: Yes

Mime
View raw message