pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitriy V. Ryaboy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1348) PigStorage making unnecessary byte array copy when storing data
Date Wed, 07 Apr 2010 19:20:33 GMT

    [ https://issues.apache.org/jira/browse/PIG-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854654#action_12854654
] 

Dmitriy V. Ryaboy commented on PIG-1348:
----------------------------------------

In the spirit of better java and micro-optimizations:

StorageUtil does things like this to convert to bytes:

{code}
out.write(((Integer)field).toString().getBytes());
{code}

Integer's toString() method creates a new string every time, even if the same integer (value-wise)
is being converted to a String.  This is better:

{code}
out.wirte(String.valueOf(field).getBytes());
{code}

(This reuses the values, and also collapses the case statement a fair bit, cleaning up the
code -- we can batch Integer, Double, etc, together and fall through to just one line of code.)

This discussion should probably go into a separate ticket.

> PigStorage making unnecessary byte array copy when storing data
> ---------------------------------------------------------------
>
>                 Key: PIG-1348
>                 URL: https://issues.apache.org/jira/browse/PIG-1348
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.7.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Richard Ding
>             Fix For: 0.7.0
>
>         Attachments: PIG-1348.patch, PIG-1348_2.patch
>
>
> InternalCachedBag makes estimate of memory available to the VM by using Runtime.getRuntime().maxMemory().
It then uses 10%(by default, though configurable) of this memory and divides this memory into
number of bags. It keeps track of the memory used by bags and then proactively spills if bags
memory usage reach close to these limits. Given all this in theory when presented with data
more then it can handle InternalCachedBag should not run out of memory. But in practice we
find OOM happening. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message