impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Youwei Wang (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-2809: Improve ByteSwap with builtin function or SSSE3 or AVX2.
Date Thu, 18 Aug 2016 12:18:04 GMT
Youwei Wang has posted comments on this change.

Change subject: IMPALA-2809: Improve ByteSwap with builtin function or SSSE3 or AVX2.
......................................................................


Patch Set 40:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3081/40/be/src/util/bit-util.cc
File be/src/util/bit-util.cc:

Line 170:   const uint8_t* src = reinterpret_cast<const uint8_t*>(source);
> 1. I find this doc inscrutable without more labeling. Are the four differen
Hi Jim.
1. I am sorry for my first coarse table there for I am a little lost due to weird push issue
mentioned in the mailinglist. Please break down the table into two parts when you read this
table: one part is for the benchmark result of using template parameter without branch, which
is colored in blue. The other part is for the benchmark result of not using template parameter
but with branch, which is colored in red. 

Each part includes five runs. Each run will yield three performance data for FastScalar, SSSE3,
AVX2 and SIMD. So for each run, we can get one single average performance data for FastScalar,
SSSE3, AVX2 and SIMD respectively. And for all these five runs, we can get the FINAL average
performance data for FastScalar, SSSE3, AVX2 and SIMD respectively.

After these two parts are done, I just copy the final average performance data for each part
and exhibit them side by side to make a easier comparsion. So I believe we can take a quick
conclusion by going through the final table.

I have colored some table columns to make it easier to read. If you are interested, would
you please revisit the sheet link? And please feel free to tell me if you still feel confused
about this table. Thank you.

2. I have used the objdump tool to check the assembly code from the libUtil.a binary. I have
copid the aasembly code of different implementations of the template function (with and without
the function pointer in the template paramenter list) to an online document link as following:
https://docs.google.com/document/d/1bCCjKPg7ytpbRTeC6UrnxoSDHCp0IOAVsQOdcQTrM9M/edit?usp=sharing

As you can see here, two different codebases have generated the same libUtil.a binary. (They
have the same md5sum value.) Based on this fact, I guess the compiler optimization has taken
care of this issue.

Thank you for sharing any of your ideas. :)


-- 
To view, visit http://gerrit.cloudera.org:8080/3081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I392ed5a8d5683f30f161282c228c1aedd7b648c1
Gerrit-PatchSet: 40
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Youwei Wang <youwei.a.wang@intel.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbapple@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Youwei Wang <youwei.a.wang@intel.com>
Gerrit-HasComments: Yes

Mime
View raw message