pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-1822) Need to document how types work in Pig
Date Fri, 15 Apr 2011 19:02:05 GMT

    [ https://issues.apache.org/jira/browse/PIG-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020398#comment-13020398

Olga Natkovich commented on PIG-1822:

Please, append the following subsection to the end of Schemas section:

How Pig Handles Schema

As you can see from above,  with a few exceptions, Pig can infer the schema of a relationship
upfront. You can see the schema of particular relation via describe. Pig enforces this computed
schema during the actually computation by casting the input data to the expected data type.
If the process is successful, the results are returned to the user; otherwise, a warning will
be generated for each record that failed to convert. Note that Pig does not know upfront the
type of the actually data and will determine this and perform the right conversion on the

Having a deterministic schema is very powerful; however, sometimes it comes at the cost of
performance. Consider the following example:

A = load ‘input’ as (x, y, z);
B = foreach A generate x+y;

If you do describe on B, you will see a single column of type double. This is because Pig
makes the safest choice and takes the largest numeric type when the schema is not know. In
practice, the input data can be containing integer values; however, Pig will cast the data
to double and make sure that a double result is returned.

If the schema of a relationship can’t be inferred, Pig will just use the runtime data as
is and propagate it through the pipeline.

> Need to document how types work in Pig
> --------------------------------------
>                 Key: PIG-1822
>                 URL: https://issues.apache.org/jira/browse/PIG-1822
>             Project: Pig
>          Issue Type: Improvement
>          Components: documentation
>            Reporter: Olga Natkovich
>            Assignee: Olga Natkovich
>             Fix For: 0.9.0
> What is static and what is dynamic.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message