hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1014) Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records
Date Tue, 13 Oct 2009 19:12:31 GMT

    [ https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765192#action_12765192
] 

Pradeep Kamath commented on PIG-1014:
-------------------------------------

The issue I see is with the implementation of COUNT today. It looks at only the first field
in the bag and counts only non null values towards the result. This can lead to mysterious
results. Consider a relation (A) with two fields with the following contents:
{noformat}
1 2
3 4
null 6
7 null
null null
{noformat}

If we have the following snippet:
{code}
B = group A all;
C = foreach B generate COUNT(A);
{code}

The answer is 3 which was arrived at only by considering record 1, record 2 and record 4 since
the other records have null in the first position. Ironically though record 4 has null in
the second position that does not prevent it from being not counted. So the result being based
on the null-ness of just the first field seems somewhat arbitrary. My concern is that most
users would not know that the result was arrived at *after* dropping records which had null
in the first field even though they did not specify COUNT(A.$0).  Status Quo means we equate
COUNT(A) to COUNT(A.$0) which is also not apparent to users.

> Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted
without considering nullness of the fields in the records
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1014
>                 URL: https://issues.apache.org/jira/browse/PIG-1014
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Pradeep Kamath
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message