pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thejas M Nair (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1447) Tune memory usage of InternalCachedBag
Date Tue, 17 Aug 2010 00:46:18 GMT

     [ https://issues.apache.org/jira/browse/PIG-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Thejas M Nair updated PIG-1447:

    Attachment: L15_modified.pig

The quest for better value for a new default value for pig.cachedbag.memusage was prompted
by changes in PIG-1443 and PIG-1492 . Before the changes made as part of those jiras, pig
was underestimating the memory footprint of data.
In data of 'typical' sizes  (chararray/bytearray with less than 20 chars), the new memory
size estimates can be upto 2 times the old version without any changes (0.6.0).

I tried running pig queries with max heap size setting for tasks as 1GB, and compared the
use of 0.1f and 0.2f as values for pig.cachedbag.memusage. I ran pigmix v1  queries(L1-L12)
,  modified pigmix v1 that specifies types , and modified L15 query which has several distincts
in a nested foreach statement.
Only queries L5, L7 and L15 had proactive spills. I see that the number of spills goes down
with 0.2f as the value, but the total runtime is practically the same. 

(See PIG-1524 for more on spills currently reported )

|| query || spills with 0.1f || spills with 0.2f || 
| L5 (original pigmix) | 496k | 0 |
| L7 (original pigmix) | 82k | 0 |
| L5 (with types) | 609k | 82k |
| L7 (with types) | 128k | 0 |
| L15_modified (attached to jira) |  501k | 326k |

Some other factors to consider while determining a new value for this property -
- as a result of issue described in PIG-1544, all proactive-spill bags don't share the memory
- the default value should be low enough, so that queries work fine in most cases. Expert
users can tweak this to improve performance
- the value of 0.1f has been used for a long time (with old memory estimate formula), and
seems to work for most cases.
- during the above tests, no other queries were running, so the disks were relatively free.

I propose that we increase the default value to 0.15f accommodate for changes in memory size
estimation so that the spill behavior is closer to what it has been with 0.6 and 0.7. 

> Tune memory usage of InternalCachedBag
> --------------------------------------
>                 Key: PIG-1447
>                 URL: https://issues.apache.org/jira/browse/PIG-1447
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.7.0
>            Reporter: Daniel Dai
>            Assignee: Thejas M Nair
>             Fix For: 0.8.0
>         Attachments: L15_modified.pig
> We need to find a better value for "pig.cachedbag.memusage".

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message