ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billie Rinaldi <billie.rina...@gmail.com>
Subject Re: Jobs view .. how to hook into it....
Date Wed, 29 Jan 2014 21:28:05 GMT
On Wed, Jan 29, 2014 at 10:29 AM, Aaron Cody <acody@hexiscyber.com> wrote:

> yes both of those things... and maybe a bit more explanation on how they
> were implemented for Hive/Pig ...
>

Let's say you have a workflow that consists of 3 MapReduce jobs.  Maybe the
workflow is a specific Hive query, or Pig script, or maybe you just have
your own script that kicks off the 3 jobs.   You run this workflow
repeatedly, and you want to be able to evaluate the relative performance of
different runs of the entire workflow as a whole -- maybe one of the jobs
is slow sometimes, but you don't know which one, or why.  To group together
the MR jobs for a particular run of the workflow, you assign each run a
unique ID, e.g. appname_run0001.  Then when you're configuring the MR jobs,
you add this ID to the job conf under the mapreduce.workflow.id property.
You probably actually have multiple types of workflows (like different Hive
queries or Pig scripts that you run repeatedly), so you can give each
workflow type a name (mapreduce.workflow.name) and use that to filter your
workflows in the web app.

Let's say in your 3 job workflow, job A runs first, then job B runs on the
output of job A, then job C uses the output of both A and B.  You can
capture these dependencies by using the adjacency properties.  Then the web
app can display the jobs in a DAG.  The following shows B and C depending
on A and C depending on B.  The last piece needed to make the DAG work is
that we have to know whether a particular MR job is an instance of A, B, or
C.  You specify this in the job conf by setting the
mapreduce.workflow.node.name property.  The job identifiers I'm using here
are single letters, but they could be anything.  Hive uses its internal
stage identifiers, and Pig uses some kind of counter.
conf.setStrings("mapreduce.workflow.adjacency.A", new String[]{"B", "C"});
conf.setStrings("mapreduce.workflow.adjacency.B", new String[]{"C"});

For Pig's implementation, look for mapreduce.workflow in this file:
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/tools/pigstats/mapreduce/MRScriptState.java

For Hive's implementation, look for mapreduce.workflow in this file:
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java
As well as the setWorkflowAdjacencies method in this file:
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java


> Also, the workflow.workflowcontext column ... looks like a blob of JSON
> which I guess ends up in some model in the web app? but how to construct
> it..? (the regex code in MapReduceHobHistoryUpdater.java is not exactly
> straightforward :)  )
>

A MapReduce workflow-producing app doesn't have to construct the object.
MapReduceHobHistoryUpdater does it for you based on the mapreduce.workflow
properties the app sets in the job configuration.  The regex it uses are
only stripping off escape characters added by JobHistory when it is logging
the information.  If you want to have a Java representation of the json, it
is a WorkflowContext object.  WorkflowContext is present in both
ambari-log4j and ambari-server.


> thanks
> A
>
> From: Billie Rinaldi <billie.rinaldi@gmail.com>
> Reply-To: <user@ambari.apache.org>
> Date: Wed, 29 Jan 2014 07:03:32 -0800
>
> To: <user@ambari.apache.org>
> Subject: Re: Jobs view .. how to hook into it....
>
> Sure.  Which part is confusing?  The adjacencies?  Or why you would use it
> at all?
>
>
> On Tue, Jan 28, 2014 at 4:47 PM, Aaron Cody <acody@hexiscyber.com> wrote:
>
>> thanks Billie - do you think you could go into a little more detail about
>> the workflow DAG stuff on the wiki? it's a little cryptic (to me anyway)  :)
>>
>> From: Billie Rinaldi <billie.rinaldi@gmail.com>
>> Reply-To: <user@ambari.apache.org>
>> Date: Mon, 20 Jan 2014 07:40:04 -0800
>> To: <user@ambari.apache.org>
>> Subject: Re: Jobs view .. how to hook into it....
>>
>> In Hadoop 1 only, there is a log4j appender on the JobTracker/JobHistory
>> that inserts the data into postgres (or whichever db you have configured).
>> The code is in contrib/ambari-log4j.
>>
>> Billie
>>
>>
>> On Fri, Jan 17, 2014 at 1:59 PM, Aaron Cody <acody@hexiscyber.com> wrote:
>>
>>> hello
>>> I'm looking at integrating my own process into the Ambari 'Jobs' view ...
>>> and I can see how the web side of things works .. i.e. the view makes REST
>>> calls to the server which in turn results in a query to postgres to get the
>>> job stats ... but what is not so clear is how those job/task stats get into
>>> postgres in the first place....
>>> Q: for example, with MapReduce .. is Hadoop/JobTracker somehow inserting
>>> the job/task info into postgres directly? Or is there some other mechanism
>>> in Ambari that is listening for map reduce jobs/tasks to start/finish?
>>>
>>> any hints on where to look in the source tree would be greatly
>>> appreciated
>>> TIA
>>>
>>
>>
>

Mime
View raw message