hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Wang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1098) [zebra] Zebra Performance Optimizations
Date Tue, 01 Dec 2009 22:09:20 GMT

    [ https://issues.apache.org/jira/browse/PIG-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784468#action_12784468

Chao Wang commented on PIG-1098:

Ideally, should have a better structure for methods such as: advance(), advanceCG(), getKey(),
getCGKey(), getValue(), getCGValue() (ColumnGroup.java).
The only difference of new *CG* methods is that they do not do the check "if (atEnd())". This
gives some performance gain while degrading code readability a bit.

Considering this is the first cut for performance improvement and all the above changes are
inside ColumnGroup class, which is package private, as a result, these are Zebra's internal
implementation details and we can safely improve them in the future,  overall +1

> [zebra] Zebra Performance Optimizations
> ---------------------------------------
>                 Key: PIG-1098
>                 URL: https://issues.apache.org/jira/browse/PIG-1098
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Yan Zhou
>            Assignee: Yan Zhou
>            Priority: Minor
>             Fix For: 0.6.0, 0.7.0
>         Attachments: PIG-1098.patch
> Many in-core performance optimization opportunities exist in zebra, such as removal of
redundant precautionary checks, use of better collection types to reduce levels of indirection
to the memory objects, changing of input splits in ascending sizes to descending sizes. Observed
improvements of wall clock time of some PIG LOAD queries are around 10%.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message