hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pradeep Gollakota <pradeep...@gmail.com>
Subject Re: Hive - regexp_replace function for multiple strings
Date Tue, 03 Feb 2015 21:55:36 GMT
I don't think this is doable using the out of the box regexp_replace() UDF.
That way I would do it, is using a file to create a mapping between a
regexp and it's replacement and write a custom UDF that loads this file and
applies all regular expressions on the input.

Hope this helps.

On Tue, Feb 3, 2015 at 10:46 AM, Viral Parikh <viral.j.parikh@gmail.com>
wrote:

> Hi Everyone,
>
> I am using hive 0.13! I want to find multiple tokens like "hip hop" and
> "rock music" in my data and replace them with "hiphop" and "rockmusic" -
> basically replace them without white space. I have used the regexp_replace
> function in hive. Below is my query and it works great for above 2 examples.
>
> drop table vp_hiphop;
> create table vp_hiphop asselect userid, ntext,
>        regexp_replace(regexp_replace(ntext, 'hip hop', 'hiphop'), 'rock music', 'rockmusic')
as ntext1from  vp_nlp_protext_males;
>
> But I have 100 such bigrams/ngrams and want to be able to do replace
> efficiently where I just remove the whitespace. I can pattern match the
> phrase - hip hop and rock music but in the replace I want to simply trim
> the white spaces. Below is what I tried. I also tried using trim with
> regexp_replace but it wants the third argument in the regexp_replace
> function.
>
> drop table vp_hiphop;
> create table vp_hiphop asselect  userid, ntext,
>         regexp_replace(ntext, '(hip hop)|(rock music)') as ntext1from  vp_nlp_protext_males;
>
>

Mime
View raw message