impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Apple (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IMPALA-5273) StringCompare is very slow
Date Tue, 02 May 2017 21:47:04 GMT
Jim Apple created IMPALA-5273:
---------------------------------

             Summary: StringCompare is very slow
                 Key: IMPALA-5273
                 URL: https://issues.apache.org/jira/browse/IMPALA-5273
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 2.9.0
            Reporter: Jim Apple


Replacing StringCompare (which uses SSE4.2 instructions) with a call to glibc's dynamically-dispatched
memcmp results in a >5x improvement for large strings.

memcmp on my machine mainly uses sse4.1's ptest, after detecting at run-time that I have sse4.1
instructions available. The StringCompare benchmark is 5 years old and likely out-of-date
by now.

To replicate:

{noformat}
create table long_strings (s string) stored as parquet;
insert into long_strings values (repeat("a", 2048));
insert into long_strings select a.s from long_strings a, long_strings b;
insert into long_strings select a.s from long_strings a, long_strings b;
insert into long_strings select a.s from long_strings a, long_strings b;
insert into long_strings select a.s from long_strings a, long_strings b;
insert into long_strings select a.s from long_strings a, long_strings b;
insert into long_strings select a.s from long_strings a, (select * from long_strings limit
10) b;
select count(*) from long_strings where s <= repeat("a", 2048);
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message