hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-266) Improve SerDe performance by using Text instead of String
Date Tue, 14 Apr 2009 20:07:14 GMT

     [ https://issues.apache.org/jira/browse/HIVE-266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zheng Shao updated HIVE-266:
----------------------------

    Attachment: HIVE-266.8.patch

Because of the last change (make it compatible with Java Primitive Class UDFs), we can calculate
hashcode based on String (just as before) instead of Text now.
This reverted the changes to sample*.q.out. Here is the updated patch.


> Improve SerDe performance by using Text instead of String
> ---------------------------------------------------------
>
>                 Key: HIVE-266
>                 URL: https://issues.apache.org/jira/browse/HIVE-266
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>            Priority: Critical
>             Fix For: 0.4.0
>
>         Attachments: HIVE-266.1.patch, HIVE-266.2.patch, HIVE-266.3.patch, HIVE-266.4.patch,
HIVE-266.5.patch, HIVE-266.6.patch, HIVE-266.7.patch, HIVE-266.8.patch
>
>
> A recent performance study showed that 2 places in Hive code has exhibited large cpu
usage percentage:
> 1. String.getBytes() (UTF-8 encoding)
> 2. String.split()
> We should replace String with Text object to:
> 1. Avoid UTF-8 decoding and encoding
> 2. Reuse the Text object and avoid creating new objects for each column in each row like
in String.split()
> This is expected to give a big (20%+) performance improvement to Hive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message