hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rekha (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-626) Statistics (records read by each mapper and reducer)
Date Wed, 10 Jun 2009 06:41:07 GMT

    [ https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717946#action_12717946

Rekha commented on PIG-626:


This is nice.I need to trap the below data per script and save it into database as record
of the run.


In pig 2.2 I see similar info in the console logs.Can the pulling be done from there?

In anycase , I do not want stats run to be an offline process as seen in the Test files in
the package.
Ideally would like to call/execute the stats module right from pig scripts itself as -
stats.record(); //first line
stats.flush('out<jobid>.txt');//last line

The push to db can be a cron process.

Any suggestions?

Thanks & Regards,

> Statistics (records read by each mapper and reducer)
> ----------------------------------------------------
>                 Key: PIG-626
>                 URL: https://issues.apache.org/jira/browse/PIG-626
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>    Affects Versions: 0.2.0
>            Reporter: Shubham Chopra
>            Assignee: Shubham Chopra
>            Priority: Minor
>             Fix For: 0.3.0
>         Attachments: PIG-626.patch, pigStats.patch, pigStats.patch, pigStats.patch, pigStats.patch,
pigStats.patch, TEST-org.apache.pig.test.TestBZip.txt
> This uses the counters framework that hadoop has. Initially, I am just interested in
finding out the number of records read by each mapper/reducer particularly for the last job
in any script. A sample code to access the statistics for the last job:
> String reducePlan = stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_PLAN");
>         if(reducePlan == null) {
>             System.out.println("Records written : " + stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_MAP_OUTPUT_RECORDS"));
>         } else {
>             System.out.println("Records written : " + stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_OUTPUT_RECORDS"));
>         }
> The patch contains 7 test cases. These include tests PigStorage and BinStorage along
with one for multiple MR jobs case.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message