hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1014) Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records
Date Mon, 12 Oct 2009 17:28:31 GMT

    [ https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764755#action_12764755
] 

Pradeep Kamath commented on PIG-1014:
-------------------------------------

The jira is to track if it is possible to automatically convert a COUNT(relation) in the script
to COUNT_STAR(relation) in the plan so that nullness of the fields in the records is not considered
while returning the count. For example if a relation (A) has two fields and there is the following
script snippet:
{noformat}
B = group A by $0;
C = foreach B generate group, COUNT(A);
{noformat}
This is equivalent to a count(*) after grouping on the first column in SQL. Per SQL semantics,
COUNT(*) counts all records for the group without regard to the nullness of the individual
fields. This behavior is achieved through COUNT_STAR built -in in pig. However COUNT built-in
in pig is meant for counting a bag with a single column  (for example COUNT(A.$0)  above).
 So the implementation in COUNT checks if the first field in the bag is null or not and only
counts non null values. In the above script if the first column in the bag is null for any
record, it does not get counted which would not be the same as the expected result for COUNT(*)
in SQL. So if the compilation phase in pig can detect that the COUNT is being performed on
a whole relation (rather than an individual column), it can replace the COUNT with COUNT_STAR
and achieve the desired result.

> Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted
without considering nullness of the fields in the records
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1014
>                 URL: https://issues.apache.org/jira/browse/PIG-1014
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Pradeep Kamath
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message