hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anandha L Ranganathan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-6140) trim udf is very slow
Date Sat, 04 Jan 2014 03:07:52 GMT

    [ https://issues.apache.org/jira/browse/HIVE-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862177#comment-13862177
] 

Anandha L Ranganathan commented on HIVE-6140:
---------------------------------------------

[~thejas]/[~cartershanklin]

Could you provide data.csv file that caused the problem. Otherwise provide example of the
data.

> trim udf is very slow
> ---------------------
>
>                 Key: HIVE-6140
>                 URL: https://issues.apache.org/jira/browse/HIVE-6140
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>            Reporter: Thejas M Nair
>            Assignee: Anandha L Ranganathan
>
> Paraphrasing what was reported by [~cartershanklin] -
> I used the attached Perl script to generate 500 million two-character strings which always
included a space. I loaded it using:
> create table letters (l string); 
> load data local inpath '/home/sandbox/data.csv' overwrite into table letters;
> Then I ran this SQL script:
> select count(l) from letters where l = 'l ';
> select count(l) from letters where trim(l) = 'l';
> First query = 170 seconds
> Second query  = 514 seconds



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message