pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1876) Typed map for Pig
Date Thu, 03 Mar 2011 17:51:37 GMT

    [ https://issues.apache.org/jira/browse/PIG-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002100#comment-13002100

Alan Gates commented on PIG-1876:

I assume at the end when the schema for b is listed as {chararray} it really should be {int},

Syntax and semantics look good.

Are there any error conditions we need to think about?  The only one I could come up with
was cases where the values in the map aren't of the indicated type, but I assume we'll handle
this just as if the top level type wasn't what was declared.

This will drive changes in the LoadCaster interface.  Those should be specified here as well.
 Do we have any plans to minimize backward compatibility issues for users on that?

> Typed map for Pig
> -----------------
>                 Key: PIG-1876
>                 URL: https://issues.apache.org/jira/browse/PIG-1876
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.9.0
> Currently Pig map type is untyped, which means map value is always of bytearray(ie. unknown)
type. In PIG-1277, we allow unknown type to be a shuffle key, which somewhat relieve the problem.
However, typed map is still beneficial in that:
> 1. User can make semantic use of the map value type. Currently, user need to explicitly
cast map value, which is ugly
> 2. Though PIG-1277 allow unknown type be a shuffle key, the performance suffers. We don't
have a raw comparator for the unknown type, instead, we need to instantiate the value object
and invoke its comparator
> Here is proposed syntax for typed map:
> map[type]
> Typed map can be used in place of untyped map could occur. For example:
> a = load '1.txt' as(map[int]);
> b = foreach a generate (map[(i:int)])a0;  - - Map value is tuple
> b = stream a through `cat` as (m:map[{(i:int,j:chararray)}]);  - - Map value is bag
> MapLookup a typed map will result datatype of map value.
> a = load '1.txt' as(map[int]);
> b = foreach a generate $0#'key';
> Schema for b:
> b: {chararray}
> The behavior of untyped map will remain the same.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message