hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Sprague <sprag...@gmail.com>
Subject Re: Issues with Hive 'Like" and multiple '%' wildcard in a pattern
Date Fri, 14 Jun 2013 02:37:38 GMT
yeah. good one.   its a bug alright.     where the trouble starts is when a
'?'  follows a wildcard char.  '%' in this case.  Looks like hive slavishly
mutates a '%' to a '.*' and since you got a '?' after that you get another
valid regex looking like '.*?' ('?" being the minimally match directive)
so it eats your '?' as meta-data and then things just go weird from there.

 in your case #2:

       '%?' (aka '.*?')  matches '?&' in your string
       'icid=main' matches 'icid=main'
       '%dl' matches 'moiz%dl'
       '%' matches nothing but thats ok

so you get a true.

*
*
i don't know of a workaround using the like operator.  i've tried various
number of blackslashes in front of the question mark to no avail to make it
a literal question mark and not a meta question mark. maybe someone else on
this list more knowledgeable on the internals can prescribe a solution.

in the meantime use rlike and a real regexp.  use double backslashes to
escape meta characters.

consider this regexp in hive:  'your_string' rlike  '.*\\?icid=main.*dl.*'


as equivalent to: 'your_string' like '%?icid=main%dl%'

let us know how it turns out.

Thanks again for finding and reporting this.

thanks,
Stephen.
PS  Note that these are only _my_ findings and  as such i can certainly be
wrong so please feel free to verify or disprove. I'm using hive 0.80 which
is kinda old.






On Thu, Jun 13, 2013 at 12:30 PM, Sunderlin, Mark <
mark.sunderlin@teamaol.com> wrote:

> Hive users: It took me a while to create a standalone use case, but here
> we go:
>
> In Oracle, this SQL returns false for both cases:
>
> SELECT CASE
>           WHEN '?&icid=mainmoiz%dl' LIKE ('%?icid=main%') THEN 'T'
>           ELSE 'F'
>        END
>           AS case1,
>        CASE
>           WHEN '?&icid=mainmoiz%dl' LIKE ('%?icid=main%dl%') THEN 'T'
>           ELSE 'F'
>        END
>           AS case2
>   FROM DUAL
> case1 case2
> F       F
>
> But in Hive 0.80, I get one False and one True:
>
> SELECT CASE
>           WHEN '?&icid=mainmoiz%dl' LIKE ('%?icid=main%') THEN 'T'
>           ELSE 'F'
>        END
>           AS case1,
>        CASE
>           WHEN '?&icid=mainmoiz%dl' LIKE ('%?icid=main%dl%') THEN 'T'
>           ELSE 'F'
>        END
>           AS case2
> FROM DUAL - A dummy single row, single column table we installed
> case1   case2
> F       T
>
>
>
> ---
> Mark E. Sunderlin
> Solutions Architect |AOL NETWORKS BDM
> P: 703-265-6935     |C: 540-327-6222 | AIM: MESunderlin
> 22000 AOL Way | Dulles, VA | 20166
>
> From: Stephen Sprague [mailto:spragues@gmail.com]
> Sent: Wednesday, June 12, 2013 2:22 PM
> To: user@hive.apache.org
> Subject: Re: Issues with Hive 'Like" and multiple '%' wildcard in a pattern
>
> Hi Mark,
> i'm running v0.80 too and multiple '%'s work as expected for me. so. we're
> gonna need a see a definitive test case from you.
> show your full string and show where the like clause fails to match.
> thanks,
> Stephen.
> PS here's my test:
>
> hisql>select city from junk;
> +------------------+
> | city             |
> +------------------+
> | West Haven       |
> | West Haven       |
> | West Haven       |
> | West Haven       |
> | West Jordan      |
> | West Chester     |
> | Westminster      |
> | West Chester     |
> | West Chester     |
> | Westlake         |
> | West Chester     |
> | West Chester     |
> | West Chester     |
> | Weston           |
> | West Covina      |
> | West Chester     |
> | West Chester     |
> | West Hollywood   |
> | West Des Moines  |
> | West Covina      |
> | Westland         |
> | West Valley City |
> | West Valley City |
> | West Valley City |
> | West Valley City |
> | Westminster      |
> | West Memphis     |
> | West Memphis     |
> +------------------+
> 28 affected
>
> hisql>select city from junk where city like '%West%v%';
> +-------------+
> | city        |
> +-------------+
> | West Haven  |
> | West Haven  |
> | West Haven  |
> | West Haven  |
> | West Covina |
> | West Covina |
> +-------------+
> 6 affected
>
> hisql>select city from junk where city like '%West%v%a%';
> +-------------+
> | city        |
> +-------------+
> | West Covina |
> | West Covina |
> +-------------+
> 2 affected
>
>
> On Wed, Jun 12, 2013 at 10:26 AM, Sunderlin, Mark <
> mark.sunderlin@teamaol.com> wrote:
> We are using Hive 0.80.
>
> ---
> Mark E. Sunderlin
> Solutions Architect |AOL NETWORKS BDM
> P: 703-265-6935     |C: 540-327-6222 | AIM: MESunderlin
> 22000 AOL Way | Dulles, VA | 20166
>
> From: Stephen Sprague [mailto:spragues@gmail.com]
> Sent: Wednesday, June 12, 2013 1:00 PM
> To: user@hive.apache.org
> Subject: Re: Issues with Hive 'Like" and multiple '%' wildcard in a pattern
>
> that seems pretty hard to believe. what version of hive are you using?
>
> On Wed, Jun 12, 2013 at 6:27 AM, Sunderlin, Mark <
> mark.sunderlin@teamaol.com> wrote:
> This seems to work just fine in other SQLs, but doesn't seem work in hive.
>
> I need to have several wild card characters in my 'like' clause as follows.
>
> In other SQLs, I want:  ....  where page_url_query like '%?icid=main%dl%'
>
> But in Hive that doesn't match.  I have several work arounds.  I can write
> it as:
>
>  where (page_url_query like '%?icid=main%')  and (page_url_query like
> '%dl%')
>
> or, I can just use something like where
> length(regexp_extract(page__url_query, '.*\?cdid-main.*dl.*',0) <> 0
>
> But more important is two or more '%' wild cards in a Hive like expression
> a known issue?
>
> ---
> Mark E. Sunderlin
> Solutions Architect   |AOL NETWORKS BDM
> P: 703-265-6935       |C: 540-327-6222 | AIM: MESunderlin
> 22000 AOL Way,  Dulles, VA  20166
>
>
>

Mime
View raw message