hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-14318) Vectorization: LIKE should use matches() instead of find(0)
Date Sat, 23 Jul 2016 01:02:20 GMT
Gopal V created HIVE-14318:
------------------------------

             Summary: Vectorization: LIKE should use matches() instead of find(0)
                 Key: HIVE-14318
                 URL: https://issues.apache.org/jira/browse/HIVE-14318
             Project: Hive
          Issue Type: Bug
          Components: Vectorization
    Affects Versions: 1.2.1, 1.3.0, 2.2.0
            Reporter: Gopal V
            Assignee: Gopal V


Checking for a match instead of find() would allow matcher to exit early instead of looking
for sub-sequences beyond the first non-match.

In UDFLike.java, the complex pattern checker uses matches() and the vectorized version uses
find(0), which is more expensive.

{code}
Benchmark                            Mode  Cnt    Score    Error  Units
RegexBench.testGreedyRegexHit        avgt    5  379.316 ± 32.444  ns/op
RegexBench.testGreedyRegexHitCheck   avgt    5  344.895 ± 15.436  ns/op
RegexBench.testGreedyRegexMiss       avgt    5  497.193 ± 18.168  ns/op
RegexBench.testGreedyRegexMissCheck  avgt    5  171.872 ±  8.588  ns/op
{code}

The miss in match is nearly ~3x more expensive per-row with the .find(0) over the .match()
check version.

The pattern match scenario is nearly the same.

The lazy scenario makes it slower when there's a hit (because match runs the check till end,
but ~2x faster when there's a miss).

{code}
RegexBench.testLazyRegexHit          avgt    5   78.398 ±  6.007  ns/op
RegexBench.testLazyRegexHitCheck     avgt    5  120.557 ±  4.396  ns/op
RegexBench.testLazyRegexMiss         avgt    5  387.594 ± 25.672  ns/op
RegexBench.testLazyRegexMissCheck    avgt    5  154.489 ± 13.622  ns/op
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message