pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Coveney (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2551) Create an AlgebraicEvalFunc and AccumulatorEvalFunc abstract class which gives you the lower levels for free
Date Tue, 13 Mar 2012 21:52:39 GMT

    [ https://issues.apache.org/jira/browse/PIG-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228745#comment-13228745
] 

Jonathan Coveney commented on PIG-2551:
---------------------------------------

I have LongSum, COUNT, AlgSum, and AlgCount. AlgSum and AlgCount are just wrappers which extend
AlgebraicEvalFunc, returning the static classes from LongSum and COUNT respectiviely (the
purpose being that their Algebraic implementation is identical, so you're testing the overhead
of the extra function calls in the Accumulator they give you).

I then used Caliper to run a benchmark which instantiated each as an Accumulator<Long>,
and ran it on a DataBag I streamed through it.

See the code to set up:

{code}
@Override protected void setUp() {
    try {
        theBag = mBagFactory.newDefaultBag();
        for (int i = 0; i < size; i++) {
            Tuple t = mTupleFactory.newTuple(1);
            t.set(0, i); 
            theBag.add(t);
        }   
    } catch (Exception e) {
        throw new RuntimeException("Error in setup");
    }   
}   

{code}

See the code to run:

{code}
public long go(Accumulator<Long> acc) {
    try {
        Iterator<Tuple> it = theBag.iterator();
        while (it.hasNext()) {
            DataBag tempBag = mBagFactory.newDefaultBag();
            for (int j = 0; it.hasNext() && j < perAcc; j++)
                tempBag.add(it.next());
            Tuple t = mTupleFactory.newTuple(1);
            t.set(0, tempBag);
            acc.accumulate(t);
        }   
        return acc.getValue();
    } catch (Exception e) {
        throw new RuntimeException("Error in go");
    }   
}  
{code}

The parameter "perAcc" is how many elements will be streamed through the accumulate function
at a time, and was set to 1000. The size was set to 1000000. There were 10 trials.
                
> Create an AlgebraicEvalFunc and AccumulatorEvalFunc abstract class which gives you the
lower levels for free
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-2551
>                 URL: https://issues.apache.org/jira/browse/PIG-2551
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>            Priority: Minor
>             Fix For: 0.11
>
>         Attachments: PIG-2551-0.patch, PIG-2551-1.patch
>
>
> This is more of a win for the Algebraic interface than the Accumulator interface, but
the idea is that if you implement the Algebraic interface, you should get Accumulator/EvalFunc
for free, and if you implement Accumulator, you should get EvalFunc for free. The win of this
is that in cases such as JRuby, you don't have to muck around doing this yourself...you have
them implement the algebraic portion, and the rest comes free (that is where this came out
of, but I feel like it is generally useful enough).
> The next piece of work I'd like to do is making an easier to implement way to make Algebraic
UDFs, but then again, my to do is huge :) Would love thoughts on this. If it doesn't make
it into Pig, it's still going to come in the JRuby stuff, so I thought it'd at least be worth
having it separate, tested, and available to everyone.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message