hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siying Dong (JIRA)" <>
Subject [jira] Commented: (HIVE-1802) Encode MapReduce Shuffling Keys Differently for Single string/bigint Key
Date Tue, 23 Nov 2010 08:53:13 GMT


Siying Dong commented on HIVE-1802:

Yongqiang, I didn't quite get it. One key applies to both of Group-by and Join. And we ARE
only processing those two cases. And we are avoiding array copy in those case. It's exactly
what we are doing here.

Are you suggesting we should also optimize other cases too? It will be nice if we can. I didn't
come up with a way that let BinarySortableSerDe to use array copy. The problem is that to
make binary sorting order the same as key order, we need a delimiter and in order to have
delimiter, strings need to be encoded to escape the delimiter. Any better idea?

> Encode MapReduce Shuffling Keys Differently for  Single string/bigint Key
> -------------------------------------------------------------------------
>                 Key: HIVE-1802
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-1802.1.patch
> Delimiters are not needed if we only have one shuffling key, and in the same time escaping
delimiters are not needed. We can save some CPU time on serializing and shuffle slightly less
amount of data to save memory footprint and network traffic.
> Also there is a bug that for group-by, we by mistake add a -1 to the end of the key and
pay one more unnecessary mem-copy. Can be easily fixed.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message