hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan DolinĂ¡r <>
Subject Re: Regexp character classes clarification
Date Thu, 01 Nov 2012 14:32:46 GMT
Hi Neil,

Have you tried to test your regexes in Java? I was using one of the
applets available on the web (e.g.
) to test my expressions before running a hive query and it helped me
a lot...

Usually all you need is to use double escaping, such as:
    select regexp_extract("abc def ghj","\\s(.*)\\s",1) from test limit 1;
This correctly returns a string " def ".

Best regards,

On Thu, Nov 1, 2012 at 3:05 PM, Neil Kodner <> wrote:
> From the hive docs on regexp_extract:
> Note that some care is necessary in using predefined character classes:
> using '\s' as the second argument will match the letter s; '
> s' is necessary to match whitespace, etc. The 'index' parameter is the Java
> regex Matcher group() method index. See
> docs/api/java/util/regex/Matcher.html for more information on the 'index' or
> Java regex group() method.
> This is confusing, especially the line break after s; '. Can anyone explain
> whether character classes work under regexp_extract?
> I'm asking because I've been having some trouble implementing regular
> expression extracts using character classes such as \w. These regular
> expressions are working in some other environments but I can't get them to
> work correctly in hive.

View raw message