pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitriy V. Ryaboy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2824) Pushing checking number of fields into LoadFunc
Date Sun, 02 Sep 2012 22:49:07 GMT

    [ https://issues.apache.org/jira/browse/PIG-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447047#comment-13447047
] 

Dmitriy V. Ryaboy commented on PIG-2824:
----------------------------------------

Jie, that's a good catch and a nice perf improvement, but the solution seems a bit heavyweight.

What if we instead modified POLoad to automatically perform this check, and be aware of expected
schemas?

                
> Pushing checking number of fields into LoadFunc
> -----------------------------------------------
>
>                 Key: PIG-2824
>                 URL: https://issues.apache.org/jira/browse/PIG-2824
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.9.0, 0.10.0
>            Reporter: Jie Li
>         Attachments: 2824.patch, 2824.png
>
>
> As described in PIG-1188, if users define a schema (w or w/o types), we need to check
the number of fields after loading data, so if there are less fields we need to pad null fields,
and if there are more fields we need to throw them away. 
> For schema with types, Pig used to insert a Foreach after the loader for type casting
which also checks #fields. For schema without types there was no such Foreach, thus PIG-1188
inserted one just for checking #fields. Unfortunately, Foreach is too expensive for such checking,
and ideally we can push it into the loader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message