pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2632) Create a SchemaTuple which generates efficient Tuples via code gen
Date Mon, 09 Apr 2012 05:35:32 GMT

    [ https://issues.apache.org/jira/browse/PIG-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249683#comment-13249683
] 

jiraposter@reviews.apache.org commented on PIG-2632:
----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4651/#review6784
-----------------------------------------------------------



trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
<https://reviews.apache.org/r/4651/#comment15009>

    instead of checking here if the tuple is generatable, the factory could default to the
regular TupleFactory if the generation fails. 



trunk/src/org/apache/pig/data/SchemaTuple.java
<https://reviews.apache.org/r/4651/#comment15010>

    How complex is this class ? Not sure it is worth pulling the whole mahout just for this.



trunk/src/org/apache/pig/data/SchemaTuple.java
<https://reviews.apache.org/r/4651/#comment15011>

    append...() methods should not be null
    (part of your TODO list?)



trunk/src/org/apache/pig/data/SchemaTuple.java
<https://reviews.apache.org/r/4651/#comment15013>

    handle the case where other == null
    Here you get NullPointerException



trunk/src/org/apache/pig/data/SchemaTuple.java
<https://reviews.apache.org/r/4651/#comment15014>

    same comment



trunk/src/org/apache/pig/data/SchemaTuple.java
<https://reviews.apache.org/r/4651/#comment15015>

    If you override equals(), you should override hashCode(). 2 object that are equal must
return the same hashcode



trunk/src/org/apache/pig/data/TupleFactory.java
<https://reviews.apache.org/r/4651/#comment15052>

    that's where we need a proper classloader



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15016>

    try avoiding accessing the PigContext statically. Add it as a parameter and see if it
can be passed from somewhere it is actually known.



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15017>

    possibly we want to rename this and/or add something else for this file.



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15018>

    ?
    either get rid of the empty catch block or document why it is Ok.
    
    Here I don't see why it would be Ok that we can not instantiate that class



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15019>

    "org.apache.pig.generated." + classname ?



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15020>

    "org/apache/pig/generated/" + classname + ".class" ?



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15021>

    we should probably not compile in the current dir.
    Put it in the temp dir intead



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15022>

    this could be calling a method in the parent



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15023>

    this could be in the parent



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15024>

    stuff which is not calling a generated field directly should be pulled up.



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15025>

    same comment



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15026>

    same



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15027>

    this is wasteful. Maybe DataByteArray could provide a static compare(byte[], byte[]) ?



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15028>

    duplicated code



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15029>

    this can be pulled up



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15030>

    the exception handling can be pulled up



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15031>

    "setPos_" + fieldPos + "((Boolean)val);"



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15032>

    same with a mapping for the java object for the type 



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15033>

    can be pulled up



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15034>

    you can just remove the call to box, it should work



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15035>

    this can be pulled up



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15036>

    can be pulled up



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15037>

    can be pulled up



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15038>

    pull up



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15039>

    pull up



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15040>

    pull up



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15041>

    pull up



trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java
<https://reviews.apache.org/r/4651/#comment15042>

    pull up
    



trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15043>

    Ok, temporary



trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15044>

    do we need this. Let's just fall back to regular tuple when generation fails, so that
we don't need to keep those in sync



trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15045>

    The factory could be generated as well so that we don't need to use reflection here.
    Reflection is slower to create new instances.



trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15046>

    The parameter type is enough to differentiate.
    getSchemaTupleFactory(Schema schema)



trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15047>

    same



trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15048>

    getTupleClass()



trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15049>

    getTupleClass()



trunk/src/org/apache/pig/data/SchemaTupleFactory.java
<https://reviews.apache.org/r/4651/#comment15050>

    getTupleClass()



trunk/src/org/apache/pig/data/Tuple.java
<https://reviews.apache.org/r/4651/#comment15051>

    typo


- Julien


On 2012-04-08 22:26:29, Jonathan Coveney wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4651/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-04-08 22:26:29)
bq.  
bq.  
bq.  Review request for pig and Julien Le Dem.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing the Schema
on the frontend, we can code generate Tuples which can be used for fun and profit. In rudimentary
tests, the memory efficiency is 2-4x better, and it's ~15% smaller serialized (heavily heavily
depends on the data, though). Need to do get/set tests, but assuming that it's on par (or
even faster) than Tuple, the memory gain is huge.
bq.  
bq.  Need to clean up the code and add tests.
bq.  
bq.  Right now, it generates a SchemaTuple for every inputSchema and outputSchema given to
UDF's. The next step is to make a SchemaBag, where I think the serialization savings will
be really huge.
bq.  
bq.  Needs tests and comments, but I want the code to settle a bit.
bq.  
bq.  
bq.  This addresses bug PIG-2632.
bq.      https://issues.apache.org/jira/browse/PIG-2632
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/bin/pig 1310666 
bq.    trunk/build.xml 1310666 
bq.    trunk/ivy.xml 1310666 
bq.    trunk/ivy/libraries.properties 1310666 
bq.    trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
1310666 
bq.    trunk/src/org/apache/pig/data/BinInterSedes.java 1310666 
bq.    trunk/src/org/apache/pig/data/FieldIsNullException.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/data/PrimitiveTuple.java 1310666 
bq.    trunk/src/org/apache/pig/data/SchemaTuple.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/data/SchemaTupleFactory.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/data/Tuple.java 1310666 
bq.    trunk/src/org/apache/pig/data/TupleFactory.java 1310666 
bq.    trunk/src/org/apache/pig/data/TypeAwareTuple.java 1310666 
bq.    trunk/src/org/apache/pig/data/utils/SedesHelper.java PRE-CREATION 
bq.    trunk/src/org/apache/pig/impl/PigContext.java 1310666 
bq.    trunk/src/org/apache/pig/newplan/logical/expression/UserFuncExpression.java 1310666

bq.  
bq.  Diff: https://reviews.apache.org/r/4651/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jonathan
bq.  
bq.


                
> Create a SchemaTuple which generates efficient Tuples via code gen
> ------------------------------------------------------------------
>
>                 Key: PIG-2632
>                 URL: https://issues.apache.org/jira/browse/PIG-2632
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.11
>
>         Attachments: PIG-2632-0.patch, PIG-2632-1.patch, PIG-2632-3.patch
>
>
> This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing the Schema
on the frontend, we can code generate Tuples which can be used for fun and profit. In rudimentary
tests, the memory efficiency is 2-4x better, and it's ~15% smaller serialized (heavily heavily
depends on the data, though). Need to do get/set tests, but assuming that it's on par (or
even faster) than Tuple, the memory gain is huge.
> Need to clean up the code and add tests.
> Right now, it generates a SchemaTuple for every inputSchema and outputSchema given to
UDF's. The next step is to make a SchemaBag, where I think the serialization savings will
be really huge.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message