pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2359) Support more efficient Tuples when schemas are known
Date Tue, 10 Jan 2012 22:24:42 GMT

    [ https://issues.apache.org/jira/browse/PIG-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183639#comment-13183639
] 

Scott Carey commented on PIG-2359:
----------------------------------

Performance comments:

bq. In PrimitiveTuple.get(), I wonder if you'd get faster access if you removed the array
bounds check. Java is going to do that for you anyway. You can catch the IndexOutOfBoundsException
and rethrow it with a nicer error message.

That is generally slower.  
1. The JVM will detect your checks and not do its own bounds checks if yours are sufficient.
(
2. The JVM will profile the method, and compile the checks with the right CPU branch hints
and instruction layout based on the odds that the branch is taken.
3. If it is out of bounds, it is a hundred times faster to find out via an if statement than
a try/catch.

All of the above are much more noticeable if in a loop than a single access, so it may not
help here much.

bq. I did that when I was going to use ByteArrayBuffer, offered by httpcore. The nice thing
about it is that it's resizable, but then again it doesn't have the r/wLong, r/wInt, etc methods,
so I reverted to regular nio.ByteBuffer.

Note, nio.ByteBuffer is 'slow' (but very handy).  Unfortunately, all calls to it are virtual
method calls and not inlined.  This is because of its dual heap / direct nature.  If serializnig
data to a byte[], writing your own private method to swizzle the int/long into the bytes can
have significant performance gains if it is a hot-spot in time spent since it will be inlined
at critical call sites while ByteBuffer's methods will not.
                
> Support more efficient Tuples when schemas are known
> ----------------------------------------------------
>
>                 Key: PIG-2359
>                 URL: https://issues.apache.org/jira/browse/PIG-2359
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: PIG-2359.1.patch, PIG-2359.2.patch, PIG-2359.3.patch, PIG-2359.4.patch
>
>
> Pig Tuples have significant overhead due to the fact that all the fields are Objects.
> When a Tuple only contains primitive fields (ints, longs, etc), it's possible to avoid
this overhead, which would result in significant memory savings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message