incubator-oozie-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Max Hansmire <hansm...@gmail.com>
Subject Re: Question about Dates
Date Fri, 06 Apr 2012 18:03:29 GMT
I have a followup to this question.

Many of the coordinators that I have data outputs setup like this.

    <output-events>
		<data-out name="output" dataset="output">
			<instance>${coord:current(-1)}</instance>
		</data-out>
	</output-events>

and then later pass it to a workflow like this.

		           <property>
					<name>outputDir</name>
					<value>${coord:dataOut('output')}</value>
			    </property>

When I use the trick that you describe below, I get this problem "variable [outputDir] cannot
be resolved". One solution I can think of is to get rid of the ${coord:current(-1)} and change
it to this ${coord:current(0)}. But this does not really make sense. I am processing yesterday's
data so I feel like the output directory should be labeled with yesterday's date.

Any tips you have would be great. For now, I will start each coordinator one day earlier than
I actually want it to run.

Max

On Mar 7, 2012, at 8:33 AM, Max Hansmire wrote:

> Thanks Mohammad.
> 
> On Mar 7, 2012, at 2:53 AM, Mohammad Islam wrote:
> 
>> The better option is to  define a variable such as MyStartTime during job submission
and use it as the value of starttime and initital-instance.
>> 
>> For example ..
>> <coordinator-app  start-time=${MyStartTime} ...>
>> 
>> <dataset initial-instance = ${MyStartTime}>
>> 
>> This will give you a lot of flexibility.
>> 
>> You can define the MyStartTime any of the following ways:
>> 1.  In job.properties file, add a line MyStartTime=2011-05-01T05:00Z
>> OR 2. Through oozie command line : oozie job -run -config ??.properties -DMyStartTime=2011-05-01T05:00Z
>> 
>> Regards,
>> Mohammad
>> 
>> 
>> 
>> ----- Original Message -----
>> From: Max Hansmire <hansmire@gmail.com>
>> To: Mohammad Islam <mislam77@yahoo.com>
>> Cc: "oozie-users@incubator.apache.org" <oozie-users@incubator.apache.org>
>> Sent: Tuesday, March 6, 2012 8:23 PM
>> Subject: Re: Question about Dates
>> 
>> No. They are not. Thanks for the help. Is there a mechanism for keeping these in
sync. Or is it just a matter of doing it manually.
>> 
>> My dataset are defined in a separate file from the coordinator.
>> 
>> Max
>> On Mar 6, 2012, at 11:09 PM, Mohammad Islam wrote:
>> 
>>> Hi Max,
>>> The "starttime" attribute of coordinator and "iniital-instance" of output data
set definition should be the same. Are they same?
>>> 
>>> Regards,
>>> Mohammad
>>> 
>>> 
>>> 
>>> ----- Original Message -----
>>> From: Max Hansmire <hansmire@gmail.com>
>>> To: oozie-users@incubator.apache.org
>>> Cc: 
>>> Sent: Tuesday, March 6, 2012 6:25 AM
>>> Subject: Question about Dates
>>> 
>>> I am having problems understanding the dates in oozie. The nominal time of my
coordinator does not always match up with the output directory of my coordinator.
>>> 
>>> Here is some data taken from the runtime properties of my workflow. The runDate
is the nominalTime of the workflow. The output dir is taken from the output event that uses
${coord:current(0)}.
>>> 
>>>   <property>
>>>     <name>runDate</name>
>>>     <value>2012-03-04</value>
>>>   </property>
>>>   <property>
>>>     <name>outputDir</name>
>>>     <value>hdfs://prodhpmaster01n:56310/user/hive/stamps/stamp_in_question/ds=2012-03-03</value>
>>>   </property>
>>> 
>>> Here is the dataset definition.
>>> 
>>>     <dataset name="example" frequency="${coord:days(1)}"
>>>         initial-instance="2011-05-01T05:00Z" timezone="America/New_York">
>>>         <uri-template>${nameNode}/user/hive/stamps/stamp_in_question/ds=${YEAR}-${MONTH}-${DAY}
>>>         </uri-template>
>>>         <done-flag></done-flag>
>>>     </dataset>
>>> 
>>> The start time of the coordinator is 07:00Z and the frequency is this: frequency="${coord:days(1)}"

>>> 
>>> I want the date on the outputDir to match the runDate. What is the best was to
achieve that? In particular, I want to know how oozie chooses the date to use with an output
event. 07:00Z (the start time) is well past the 05:00Z start time of the data set so it seems
like they should match up. I suspect that am thinking about this all wrong though.
>>> 
>>> Max
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message