hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thejas M Nair (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1062) load-store-redesign branch: change SampleLoader and subclasses to work with new LoadFunc interface
Date Tue, 17 Nov 2009 19:14:39 GMT

    [ https://issues.apache.org/jira/browse/PIG-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779054#action_12779054

Thejas M Nair commented on PIG-1062:

In SampleLoader.java
Isn't the idea of SampleLoader only to carry common code for RandomSampleLoader and PoissonLoader
and add a computeSamples() method? - Looks like now it has the getNext() implementation
needed by RandomSampleLoader in it now. Should we move that to RandomSampleLoader instead?

RandomSampleLoader.getNext() is fairly generic, it can be used by any new sample loader classes
where the number of samples to be sampled in each map is known in advance. So having this
getNext() implementation in SampleLoader can be useful in future.

Why is skipNext() needed? Can't loader.getNext() == null be used instead? If so, is recordReader
skipNext() calls recordReader.getNext() which does not parse the record in to a tuple, unlike
loader.getNext(). This way records can be more efficiently skipped.

I will create a new patch addressing other comments.

> load-store-redesign branch: change SampleLoader and subclasses to work with new LoadFunc
> ---------------------------------------------------------------------------------------------------
>                 Key: PIG-1062
>                 URL: https://issues.apache.org/jira/browse/PIG-1062
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>         Attachments: PIG-1062.patch, PIG-1062.patch.3
> This is part of the effort to implement new load store interfaces as laid out in http://wiki.apache.org/pig/LoadStoreRedesignProposal
> PigStorage and BinStorage are now working.
> SampleLoader and subclasses -RandomSampleLoader, PoissonSampleLoader need to be changed
to work with new LoadFunc interface.  
> Fixing SampleLoader and RandomSampleLoader will get order-by queries working.
> PoissonSampleLoader is used by skew join. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message