hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yan Zhou (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1207) [zebra] Data sanity check should be performed at the end of writing instead of later at query time
Date Tue, 09 Mar 2010 16:59:27 GMT

     [ https://issues.apache.org/jira/browse/PIG-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yan Zhou updated PIG-1207:
--------------------------

    Attachment: PIG-1207.patch

The same patch based on current trunk

> [zebra] Data sanity check should be performed at the end  of writing instead of later
at query time
> ---------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1207
>                 URL: https://issues.apache.org/jira/browse/PIG-1207
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Yan Zhou
>            Assignee: Yan Zhou
>         Attachments: PIG-1207.patch, PIG-1207.patch
>
>
> Currently the equity check of number of rows across different column groups are performed
by the query. And the error info is sketchy and only emits a "Column groups are not evenly
distributed", or worse,  throws an IndexOufOfBound exception from CGScanner.getCGValue since
BasicTable.atEnd and BasicTable.getKey, which are called just before BasicTable.getValue,
only checks the first column group in projection and any discrepancy of the number of rows
per file cross multiple column groups in projection could have  BasicTable.atEnd  return false
and BasicTable.getKey return a key normally but another column group already exaust its current
file and the call to its CGScanner.getCGValue throw the exception. 
> This check should also be performed at the end of writing and the error info should be
more informational.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message