hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thejas M Nair (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1062) load-store-redesign branch: change SampleLoader and subclasses to work with new LoadFunc interface
Date Mon, 02 Nov 2009 19:52:59 GMT

    [ https://issues.apache.org/jira/browse/PIG-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772623#action_12772623
] 

Thejas M Nair commented on PIG-1062:
------------------------------------

Even after the interface changes, pig can compute the file size by adding up size of each
split (from InputSplit.getLenght()) . The documentation of the function in the interface does
not make it clear if this is size on disk , compressed/uncompressed etc. Assuming it is size
on disk (uncompressed), estimating the total memory it will require is a challenge, one has
to make assumption about the compression ratio and the serialization method.
Using Tuple.getMemorySize() while sampling will give more accurate numbers for reducer memory
that it will consume.

> load-store-redesign branch: change SampleLoader and subclasses to work with new LoadFunc
interface 
> ---------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1062
>                 URL: https://issues.apache.org/jira/browse/PIG-1062
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>
> This is part of the effort to implement new load store interfaces as laid out in http://wiki.apache.org/pig/LoadStoreRedesignProposal
.
> PigStorage and BinStorage are now working.
> SampleLoader and subclasses -RandomSampleLoader, PoissonSampleLoader need to be changed
to work with new LoadFunc interface.  
> Fixing SampleLoader and RandomSampleLoader will get order-by queries working.
> PoissonSampleLoader is used by skew join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message