hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunderlin, Mark" <mark.sunder...@teamaol.com>
Subject RE: Question on regexp_extract() , Index?
Date Thu, 27 Oct 2011 18:18:58 GMT
Ah, easy enough!  And if you have no groups, you just want the match, it seems index of 0 works
just fine.

select regexp_extract('junk:text:ua123','ua[0-9]+',0) from dual -- assumes you have created
a dummy hive table called dual
ua123


---
Mark E. Sunderlin
Solutions Architect |AOL Data Warehouse
P: 703-256-6935 | C: 540-327-6222
AIM: MESunderlin
22000 AOL Way | Dulles, VA | 20166


-----Original Message-----
From: Mark Grover [mailto:mgrover@oanda.com] 
Sent: Thursday, October 27, 2011 2:10 PM
To: user@hive.apache.org
Subject: Re: Question on regexp_extract() , Index?

Mark,
It specifies the group that you want to be extract after the Regex has been matched. Group
numbers are 1-referenced. In the example you gave, the first group corresponds to "the", the
second group corresponds to "bar". A group in Regex is specified by a pair of parenthesis.

You can also look at the source code to gain further understanding.
http://www.javasourcecode.org/html/open-source/hive/hive-0.7.1/org/apache/hadoop/hive/ql/udf/UDFRegExpExtract.java.html

Matcher is the API that is used to match the Regex. More on that at:
http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html

Mark

----- Original Message -----
From: "Mark Sunderlin" <mark.sunderlin@teamaol.com>
To: "user@hive.apache.org" <user@hive.apache.org>
Sent: Thursday, October 27, 2011 1:57:08 PM
Subject: Question on regexp_extract() , Index?

I've been working with the hive regexp_extract(string subject, string pattern, int index)
command.  In the hive language manual, https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions,
the following description for this function is given:

Returns the string extracted using the pattern. e.g. regexp_extract('foothebar', 'foo(.*?)(bar)',
2) returns 'bar.' Note that some care is necessary in using predefined character classes:
using '\s' as the second argument will match the letter s; '
s' is necessary to match whitespace, etc. The 'index' parameter is the Java regex Matcher
group() method index. See docs/api/java/util/regex/Matcher.html for more information on the
'index' or Java regex group() method.

I tried doing some basic web searches and cannot find what I seek, understanding of what the
Index value in the regexp() call does. 

Where exactly is the information listed as, "See docs/api/java/util/regex/Matcher.html?" 
Is it online somewhere?


---
Mark E. Sunderlin
Solutions Architect |AOL Data Warehouse
P: 703-256-6935 | C: 540-327-6222
AIM: MESunderlin
22000 AOL Way | Dulles, VA | 20166


Mime
View raw message