hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (PIG-628) PERFORMANCE: Misc. optimizations including optimization in Tuple serialization, set up of PigMapReduce & PigCombiner, accessing index in POLocalRearrange
Date Thu, 22 Jan 2009 22:04:00 GMT

     [ https://issues.apache.org/jira/browse/PIG-628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Olga Natkovich resolved PIG-628.
--------------------------------

    Resolution: Fixed

patch committed, thanks pradeep

> PERFORMANCE: Misc. optimizations including optimization in Tuple serialization, set up
of PigMapReduce & PigCombiner, accessing index in POLocalRearrange
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-628
>                 URL: https://issues.apache.org/jira/browse/PIG-628
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: types_branch
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: types_branch
>
>         Attachments: PIG-628.patch
>
>
> - Currently DefaultTuple.write() needlessly writes a marker for null/not null. This is
already handled by PigNullableWritable for keys and NullableTuple for values. Nested null
tuples inside a tuple are written out as nulls in DataReaderWriter.writeDatum. So the null/not
null marker in DefaultTuple can be avoided.
> - In PigMapReduce and PigCombiner the roots and leaves of the plans are calculated in
each reduce() call. Instead these can be computed in configure() one time.
> - In each call of POLocalRearrange.getNext(), a new lroutput tuple is created whose first
position is filled with index, second with key and third with value - this can be optimized
by having a tuple member in POLocalRearrange which is reused in each getNext() call. Further,
the first position of this tuple can be pre-filled with the index in the setIndex() method
of POLocalRearrange at script compile time.
> - In POCombinerPackage, the metadata data structures to figure out which parts of the
value are present in the key can be set up in the setKeyInfo() method at compile time. This
is because we currently use POCombinerPackage only with a "group by". Hence we don't need
to look up the metadata at run time based on input index since there will be only one input
(index = 0)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message