hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Min Zhou (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)
Date Sun, 28 Jun 2009 03:18:47 GMT

    [ https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724916#action_12724916
] 

Min Zhou edited comment on HIVE-537 at 6/27/09 8:18 PM:
--------------------------------------------------------

we've done a test about this issue, dataset: 700m records.

first approach where each distinct count was computed one by one, each of them needed 119
seconds, that meant 10 distinct count need at least  1190 seconds.
second approach where distinct keys were distinguished by a tag,  10 distinct count need 148
seconds.

      was (Author: coderplay):
    we've done a test about this issue, dataset: 700m records.

first approach, each distinct count needs 119 seconds, that's means 10 distinct count needs
at least  1190 seconds.
second approach where distinct keys were distinguished by a tag,  10 distinct count need 148
seconds.
  
> Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-537
>                 URL: https://issues.apache.org/jira/browse/HIVE-537
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>
> There are already some cases inside the code that we use heterogeneous data: JoinOperator,
and UnionOperator (in the sense that different parents can pass in records with different
ObjectInspectors).
> We currently use Operator's parentID to distinguish that. However that approach does
not extend to more complex plans that might be needed in the future.
> We will support the union type like this:
> {code}
> TypeDefinition:
>   type: primitivetype | structtype | arraytype | maptype | uniontype
>   uniontype: "union" "<" tag ":" type ("," tag ":" type)* ">"
> Example:
>   union<0:int,1:double,2:array<string>,3:struct<a:int,b:string>>
> Example of serialized data format:
>   We will first store the tag byte before we serialize the object. On deserialization,
we will first read out the tag byte, then we know what is the current type of the following
object, so we can deserialize it successfully.
> Interface for ObjectInspector:
> interface UnionObjectInspector {
>   /** Returns the array of OIs that are for each of the tags
>    */
>   ObjectInspector[] getObjectInspectors();
>   /** Return the tag of the object.
>    */
>   byte getTag(Object o);
>   /** Return the field based on the tag value associated with the Object.
>    */
>   Object getField(Object o);
> };
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message