impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-5266 Impala ABM / LZCNT support
Date Fri, 12 May 2017 15:29:13 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5266 Impala ABM / LZCNT support

Patch Set 8:

File be/src/benchmarks/

Line 33: //                   PopcountSlowNoHw               78.8     79.6     80.3    0.0782X
   0.0788X    0.0787X
This should be the baseline for the popcount functions right? I think you'll get the right
relative timings for those if you divide this up into multiple suites, one per function.
File be/src/codegen/CMakeLists.txt:

I don't think this is safe - this tells clang it's safe to emit lzcnt anywhere. Ivy Bridge
has SSE4.2 but not LZCNT from what I can tell and we only check for SSE4.2 before loading
this version of the module (in LlvmCodeGen::CreateFromMemory()).

Currently we have two ways of dealing with these architectural differences - this compile-time
approach where we generate different versions of the module, some of which have the unsupported
instructions removed at compile-time (see be/src/util/sse-util.h) - and a run-time approach
where we apply a target attribute to the individual function and have a runtime check before
calling the function. I think both approaches have downsides so I'm not sure what's right
to do here. For the compile-time approach we need an extra IR module for every configuration.

I wish we had a better solution for cases like this.
File be/src/util/bit-util.h:

Line 323:   static inline int Log2FloorNonZero(uint32_t n) {
> This actually ended up unused, as the 32-bit version of the RoundUpToPowerO
Can we remove it?

Line 387:   static inline int32_t RoundUpToPowerOfTwo(int32_t v) {
Do we need the 32-bit version? I think in most cases we should be doing the kind of calculations
that use this (buffer sizes, etc) in 64-bit arithmetic - having the overloads could lead to
subtle bugs.
File be/src/util/fixed-size-hash-table.h:

Line 53:     DCHECK_EQ(capacity_, BitUtil::RoundUpToPowerOfTwo(static_cast<int64_t>(capacity_)));
> Uses of unsigned integers actually get caught now.
This should really be DCHECK(BitUtil::IsPowerOfTwo()) - I think this was written before that
function existed.
File be/src/util/sse-util.h:

Line 190: #define POPCNT_popcnt_u32 _mm_popcnt_u32
I mentioned this elsewhere but I don't think these are always available when SSE4.2 is present,
so it's not necessarilysafe to have these under the SSE4.2 ifdef.

It looks like POPCNT is (mostly?) available when SSE42 is available:
but LZCNT definitely isn't.

I've also seen some cases where VMs report weird CPU info that doesn't match any real processor
(e.g. "haswell" without AVX support), so we should be a little careful about what we assume.
I can't think of a reason why SSE4.2 would be enabled but POPCNT disabled but who knows.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I9f6a465ab4a9ee4f582847f8e211a779bdede3d2
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <>
Gerrit-Reviewer: Attila Jeges <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Jim Apple <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-Reviewer: Zach Amsden <>
Gerrit-HasComments: Yes

View raw message