hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saurabh S <saurab...@live.com>
Subject Accessing elements from array returned by split() function
Date Thu, 01 Mar 2012 21:19:02 GMT

Hello,

I have a set of URLs which I need to parse. For example, if the url is,
http://www.google.com/anything/goes/here,

I need to extract www.google.com, i.e. everything between second and third forward slashes.

I can't figure out the regex pattern to do so, and am trying to use split() function instead.
So, my hive query looks like
select url, split(url,'/')
...

The second column contains the entire array returned by the split function. Is there any way
to access only the second element of the array, which will give me what I need?

When I try the following statement select url, split(url,'/')[1], I get an empty second column.

Is this the expected behavior? Any other suggestions on how to parse the URL?

Oh by the way, I'm aware that the function parse_url(url,'HOST') will give me something similar
to what I want, but for some reason, that function on my database is running extremely slow.

First time posting to this list. If there is anything wrong, please let me know.

Regards,
Saurabh

 		 	   		  
Mime
View raw message