pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2638) Optimize BinInterSedes treatment of longs
Date Thu, 31 May 2012 05:46:25 GMT

    [ https://issues.apache.org/jira/browse/PIG-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286345#comment-13286345
] 

Ashutosh Chauhan commented on PIG-2638:
---------------------------------------

Another option is to use VInts and its friends which are used in core hadoop and in other
parts of the ecosystem. http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/VIntWritable.html

                
> Optimize BinInterSedes treatment of longs
> -----------------------------------------
>
>                 Key: PIG-2638
>                 URL: https://issues.apache.org/jira/browse/PIG-2638
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.11, 0.10.1
>
>         Attachments: PIG-2638-0.patch, PIG-2638-1.patch
>
>
> During adventures in BinInterSedes, I noticed that Integers are written in an optimized
fashion, but longs are not. Given that, in the general case, we have to write type information
anyway, we might as well do the same optimization for Longs. That is to say, given that most
longs won't have 8 bytes of information in them, why should we waste the space of serializing
8 bytes?
> This patch takes its inspiration from varint encoding per these two sources:
> http://javasourcecode.org/html/open-source/mahout/mahout-0.5/org/apache/mahout/math/Varint.java.html
> https://developers.google.com/protocol-buffers/docs/encoding
> Though, nicely enough, we don't actually have to use varints. Since we HAVE to write
an 8 byte type header, we might as well include the number of bytes we had to write. I use
zig zag encoding so that in the case of negative numbers, we see the benefit.
> This should decrease the amount of serialized long data by a good bit.
> Patch incoming. It passes test-commit in 0.11.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message