hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1062) load-store-redesign branch: change SampleLoader and subclasses to work with new LoadFunc interface
Date Mon, 16 Nov 2009 23:43:39 GMT

    [ https://issues.apache.org/jira/browse/PIG-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778654#action_12778654
] 

Pradeep Kamath commented on PIG-1062:
-------------------------------------

Review comments:
In SampleLoader.java
====================
Isn't the idea of SampleLoader only to carry common code for RandomSampleLoader and PoissonLoader
and add a computeSamples() method? - Looks like now it has the getNext() implementation
needed by RandomSampleLoader in it now. Should we move that to RandomSampleLoader instead?


{code}
134             System.err.println("Sample " + samples[nextSampleIdx]);
{code}
Debug statement above should be removed.


Why is skipNext() needed? Can't loader.getNext() == null be used instead? If so, is recordReader
needed?

In RandomSampleLoader.java
==========================
XXX FIXME comment (put in by me :))should be removed

I think we should move the actual getNext() implementation code from SampleLoader to here

In PoissonSampleLoader.java
============================

{code}
 40         // this will be value of first column in the special row   
{code}
I think this is no longer the case - should be removed.


{code}
    58     // memory per sample. divide this by avgTupleMemSize to get skipInterval 
     59     private long memPerSample=0;
     60 
{code}
Should the above be called memToSkipPerSample?


{code}
 104         if(skipInterval == -1){
{code}
It doesn't look like skipInterval is initialized to -1


Instead of keeping track of max. num of columns in the different rows and then appending the
special marker string and num of rows at the end, would it be better to just have these as
the
first two fields of the last tuple emitted and then introduce a split-union combination to

ensure that the foreach pipeline gets the regular tuples (excluding the special tuple)?



> load-store-redesign branch: change SampleLoader and subclasses to work with new LoadFunc
interface 
> ---------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1062
>                 URL: https://issues.apache.org/jira/browse/PIG-1062
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>         Attachments: PIG-1062.patch, PIG-1062.patch.3
>
>
> This is part of the effort to implement new load store interfaces as laid out in http://wiki.apache.org/pig/LoadStoreRedesignProposal
.
> PigStorage and BinStorage are now working.
> SampleLoader and subclasses -RandomSampleLoader, PoissonSampleLoader need to be changed
to work with new LoadFunc interface.  
> Fixing SampleLoader and RandomSampleLoader will get order-by queries working.
> PoissonSampleLoader is used by skew join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message