hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thejas M Nair (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-6140) trim udf is very slow
Date Sat, 04 Jan 2014 01:05:50 GMT
Thejas M Nair created HIVE-6140:
-----------------------------------

             Summary: trim udf is very slow
                 Key: HIVE-6140
                 URL: https://issues.apache.org/jira/browse/HIVE-6140
             Project: Hive
          Issue Type: Bug
          Components: UDF
            Reporter: Thejas M Nair



Paraphrasing what was reported by [~cartershanklin] -

I used the attached Perl script to generate 500 million two-character strings which always
included a space. I loaded it using:
create table letters (l string); 
load data local inpath '/home/sandbox/data.csv' overwrite into table letters;
Then I ran this SQL script:
select count(l) from letters where l = 'l ';
select count(l) from letters where trim(l) = 'l';

First query = 170 seconds
Second query  = 514 seconds





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message