pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheolsoo Park (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-3956) UDF profile is often misleading
Date Thu, 22 May 2014 16:13:02 GMT

     [ https://issues.apache.org/jira/browse/PIG-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Cheolsoo Park updated PIG-3956:
-------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed to trunk. Thank you Daniel for the review!

> UDF profile is often misleading
> -------------------------------
>
>                 Key: PIG-3956
>                 URL: https://issues.apache.org/jira/browse/PIG-3956
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.14.0
>
>         Attachments: PIG-3956-1.patch
>
>
> Pig provides UDF profiling via the {{pig.udf.profile}} perperty. How it works is that
it samples every 100th UDF invocation and increments a counter by 100 x elapsed time.
> {code}
> private final static int TIMING_FREQ = 100;
> ...
> PigStatusReporter.getInstance().incrCounter(counterGroup, TIMING_COUNTER,
>     Math.round((System.nanoTime() - startNanos) / 1000) * TIMING_FREQ);
> {code}
> The problem is that this assumes every invocation takes about the same time, which doesn't
hold true oftentimes.
> Here is an example. TIMING_FREQ = 100.
> {code}
> set pig.udf.profile true;
> set opt.fetch false;
> l = LOAD 'foo' AS (value:chararray);
> f = FOREACH l GENERATE LOWER(value);
> DUMP f;
> {code}
> This job gives me the following counters-
> {code}
> org.apache.pig.builtin.LOWER - approx_invocations: 6,144,600
> org.apache.pig.builtin.LOWER - approx_microsecs: 130,913,800
> {code}
> However, if I change TIMING_FREQ to 1, I get the followings-
> {code}
> org.apache.pig.builtin.LOWER - approx_invocations: 6,144,248
> org.apache.pig.builtin.LOWER - approx_microsecs: 19,711,842
> {code}
> As can be seen, approx_microsecs is 6x larger when sampling every 100th invocation as
compared to when sampling every invocation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message