oozie-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eduardo Afonso Ferreira <eafon...@yahoo.com>
Subject Re: Capturing Pig action output
Date Thu, 06 Sep 2012 17:27:31 GMT

I'm still interested in learning if a pre-packaged 3.2 version is available out there that
I can install, but I was able to move a little more by adding another jar to my app, i.e.
json-simple-1.1.1.jar which solved the NoClassDefFoundError I experienced.

Now I see the stats field on the oozie database (WF_ACTIONS.stats) filled with a JSON of the
PigStats that I'm interested in. But I still can't see it when I run -info with -verbose.
Am I missing something?


 From: Eduardo Afonso Ferreira <eafonsof@yahoo.com>
To: "oozie-users@incubator.apache.org" <oozie-users@incubator.apache.org> 
Sent: Thursday, September 6, 2012 12:14 PM
Subject: Re: Capturing Pig action output
Hey, Virag,

I built and installed Oozie 3.2 from http://incubator.apache.org/oozie/Downloads.html.
I set the property oozie.action.external.stats.write to true on my WF and deployed/submitted/etc.
But I still don't see PigStats when I do the -info request (ex. below) and I see exceptions
related to org.json.simple.JSONObject (NoClassDefFoundError). Maybe a build problem.

What would be the best way of getting version 3.2 up and running? Any package out there already
built that we could download and install? I mean, without need to build/package and look for
solving all sorts of dependencies.

eferreira@eferreira-tbs-desktop:~/projects/aspen-core/oozie/apps$ oozie job -oozie http://localhost:11000/oozie
-info 0000197-120905170442968-oozie-oozi-W -verbose
Job ID : 0000197-120905170442968-oozie-oozi-W
Workflow Name : video_play_counts-wf
App Path      : hdfs://aspendevhdp1.cnn.vgtf.net:54310/user/eferreira/oozie/apps/video_play_counts
Status        : RUNNING
Run           : 0
User          : eferreira
Group         : -
Created       : 2012-09-06 14:53
Started       : 2012-09-06 14:53
Last Modified : 2012-09-06 14:53
Ended         : -
CoordAction ID: 0000196-120905170442968-oozie-oozi-C@1

ID    Console URL    Error Code    Error Message    External ID    External
Status    Name    Retries    Tracker URI    Type    Started    Status   
0000197-120905170442968-oozie-oozi-W@pig-node    http://aspendevhdp1.cnn.vgtf.net:50030/jobdetails.jsp?jobid=job_201208071502_69799   
-    -    job_201208071502_69799    RUNNING    pig-node    0    aspendevhdp1.cnn.vgtf.net:54311   
pig    2012-09-06 14:53    RUNNING    -

From: Virag Kothari <virag@yahoo-inc.com>
To: "oozie-users@incubator.apache.org" <oozie-users@incubator.apache.org>; Eduardo Afonso
Ferreira <eafonsof@yahoo.com> 
Sent: Thursday, August 30, 2012 2:59 PM
Subject: Re: Capturing Pig action output


From 3.2 onwards, counters and hadoop job ids for Pig and Map-reduce can
be accessed through the API or EL function.

First, the following should be set in wf configuration. This will store
the Pig/MR related statistics in the DB.

Then, the stats and jobIds can be accessed using the verbose API
oozie job -info <jobId> -verbose

Also, the hadoop job Id's can be retrieved for a Pig action through


Detailed docs at 
onalSpec.html. Look under "4.2.5 Hadoop EL Functions"


On 8/30/12 10:31 AM, "Eduardo Afonso Ferreira" <eafonsof@yahoo.com> wrote:

>Hi there,
>I have a pig that runs periodically by oozie via coordinator with a set
>I wanted to capture the Pig script output because I need to look at some
>information on the results to keep track of several things.
>I know I can look at the output by doing a whole bunch of clicks starting
>at the oozie web console as follows:
>- Open oozie web console (ex.: http://localhost:11000/oozie/)
>- Find and click the specific job under "Workflow Jobs"
>- Select (click) the pig action in the window that pops up
>- Click the magnifying glass icon on the "Console URL" field
>- Click the Map of the launcher job
>- Click the task ID
>- Click All under "Task Logs"
>My question is how can I know the exact name and location of that log
>file in HDFS so I can programmaticaly retrieve the file from HDFS and
>parse and look for what I need?
>Is this something I can determine ahead of time, like pass a
>parameter/argument to the action/pig so that it will store the log where
>I want with the file name I want?
>Thanks in advance for your help.
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message