hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ying He (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1062) load-store-redesign branch: change SampleLoader and subclasses to work with new LoadFunc interface
Date Fri, 30 Oct 2009 16:05:59 GMT

    [ https://issues.apache.org/jira/browse/PIG-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771979#action_12771979
] 

Ying He commented on PIG-1062:
------------------------------

I would suggest to add the total number of tuples of a split into the last sample as a field.
All other sample tuples can have this field as NULL. Then in PartitionSkewedKey.calculateReducers,
it can add up this field from all the samples to get total number of tuples from input.

If we use a separate tuple with different format to represent total number of tuples, that
would involve a bigger change. The sampling job currently add an "all" to all samples to group
them into one bag, and then sort the tuples by keys. If tuples are of different format, the
execution plan has to be changed to be more complex to deal with these special tuples.

> load-store-redesign branch: change SampleLoader and subclasses to work with new LoadFunc
interface 
> ---------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1062
>                 URL: https://issues.apache.org/jira/browse/PIG-1062
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Thejas M Nair
>
> This is part of the effort to implement new load store interfaces as laid out in http://wiki.apache.org/pig/LoadStoreRedesignProposal
.
> PigStorage and BinStorage are now working.
> SampleLoader and subclasses -RandomSampleLoader, PoissonSampleLoader need to be changed
to work with new LoadFunc interface.  
> Fixing SampleLoader and RandomSampleLoader will get order-by queries working.
> PoissonSampleLoader is used by skew join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message