impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Apple (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (IMPALA-92) Significant performance difference between LIKE = 'x' AND = 'x'
Date Thu, 16 Mar 2017 04:03:41 GMT

     [ https://issues.apache.org/jira/browse/IMPALA-92?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jim Apple updated IMPALA-92:
----------------------------
    Attachment: like-predicate.h.patch
                like-predicate.cc.patch
                z2

> Significant performance difference between LIKE = 'x' AND = 'x'
> ---------------------------------------------------------------
>
>                 Key: IMPALA-92
>                 URL: https://issues.apache.org/jira/browse/IMPALA-92
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 0.6
>            Reporter: Philip Zeyliger
>            Assignee: Skye Wanderman-Milne
>             Fix For: Impala 0.7
>
>         Attachments: like-predicate.cc.patch, like-predicate.cc.patch, like-predicate.cc.patch,
like-predicate.h.patch, like-predicate.h.patch, like-predicate.h.patch, z, z2
>
>
> I'm running the following two queries.  The only difference between them is I'm using
"LIKE" in one case and "=" in another, though there is no "%" in the LIKE, so the effect is
the same.  I was surprised to see approximately a 10x difference in performance between them.
> {code}
> Query: select v1, c, count(*) FROM xxx b, yyy a  WHERE a.v1 = b.file AND v5 LIKE "hostId"
AND v3 = "hosts" GROUP BY v1, c ORDER BY count(*) limit 1000
> Returned 89 row(s) in 10.13s
> Query: select v1, c, count(*) FROM xxx b, yyy a  WHERE a.v1 = b.file AND v5 LIKE "hostId"
AND v3 = "hosts" GROUP BY v1, c ORDER BY count(*) limit 1000
> Returned 89 row(s) in 93.76s
> {code}
> I'm running
> {code}
> impalad version 0.6 RELEASE (build e675301a90e370f694d700b395a13f0265b7f09c)
> {code}
> I've attached the two query profiles.  The basic difference is in the execution rate:
> {code}
> -    Averaged Fragment 2:(1m27s 0.00%)
> -      completion times: min:1m19s  max:1m32s  mean: 1m28s  stddev:4s545ms
> -      execution rates: min:35.33 MB/sec  max:41.00 MB/sec  mean:37.37 MB/sec  stddev:1.90
MB/sec
> +         - RowsReturnedRate: 9.00 /sec
> +    Averaged Fragment 2:(7s906ms 0.00%)
> +      completion times: min:7s620ms  max:9s495ms  mean: 8s056ms  stddev:653ms
> +      execution rates: min:342.95 MB/sec  max:436.42 MB/sec  mean:409.84 MB/sec  stddev:31.25
MB/sec
> {code}
> Obviously I've fixed my query.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message