hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-625) Use of BinarySortableSerDe for serialization of the value between map and reduce boundary
Date Fri, 10 Jul 2009 10:39:15 GMT

     [ https://issues.apache.org/jira/browse/HIVE-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zheng Shao updated HIVE-625:
----------------------------

    Attachment: HIVE-625.1.patch

Some extreme test result shows there is a big performance improvement.

{code}
  select CAST(rand() * 1024 * 1024 AS INT) as a, rand() as b from mytable cluster by a limit
10;
{code}

The key is an int, and the value is a double. I ran this on an example table.

The mappers of the new code takes on average 98 seconds.
The mappers of the old code (without this patch) takes on average 165 seconds.

Although this is an extreme example, it does show the huge improvement from using the binary
serialization format.
Note that the test was done with gzip as mapred.map.output.compression.codec, so the difference
of time is exaggerated a bit (compared with the same when we use Lzo).

> Use of BinarySortableSerDe for serialization of the value between map and reduce boundary
> -----------------------------------------------------------------------------------------
>
>                 Key: HIVE-625
>                 URL: https://issues.apache.org/jira/browse/HIVE-625
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>         Attachments: HIVE-625.1.patch
>
>
> We currently use LazySimpleSerDe which serializes double to text format. Before we have
LazyBinarySerDe, we should switch to BinarySortableSerDe because that's still much faster
than LazySimpleSerDe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message