hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: slow performance when using udf
Date Mon, 15 Aug 2011 13:02:41 GMT
On Monday, August 15, 2011, Carl Steinbach <carl@cloudera.com> wrote:
> Converting it to a GenericUDF (i.e. extending GenericUDF instead of UDF)
should help some with performance.
> On Mon, Aug 15, 2011 at 1:49 AM, wd <wd@wdicc.com> wrote:
>>
>> hi,
>>
>> I create a udf to decode urlencoded things, but found the speed for
>> mapred is 3 times(73sec -> 213 sec) as before. How to optimize it?
>>
>> package com.test.hive.udf;
>>
>> import org.apache.hadoop.hive.ql.exec.UDF;
>> import java.net.URLDecoder;
>>
>> public final class urldecode extends UDF {
>>
>>    public String evaluate(final String s) {
>>        if (s == null) { return null; }
>>        return getString(s);
>>    }
>>
>>    public static String getString(String s) {
>>        String a;
>>        try {
>>            a = URLDecoder.decode(s);
>>        } catch ( Exception e) {
>>            a = "";
>>        }
>>        return a;
>>    }
>>
>>    public static void main(String args[]) {
>>        String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A";
>>        System.out.println( getString(t) );
>>    }
>> }
>
>

Also you should use class level privatete members to save on object
incantation and garbage collection.

You also get benefits by matching the args with what you would normally
expect from upstream. Hive converts text to string when needed, but if the
data normally coming into the method is text you could try and match the
argument and see if it is any faster.

Mime
View raw message