hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: [WAS Re: periodic execution] Oozie Hive action
Date Thu, 17 Feb 2011 01:14:14 GMT
On Wed, Feb 16, 2011 at 8:11 PM, Alejandro Abdelnur <tucu@cloudera.com> wrote:
> An update on this.
> I've finished doing changes in Oozie Hive-action to work with Hive 0.7.
> As mentioned before the problem is that not all needed Hive & dependent JARs
> are available in public Maven repos.
> Early next week the Cloudera Maven repositories should have beta versions of
> these JARs (currently I'm building against SNAPSHOTs).
> As soon as the beta JARs are available I'll post a patch using those JAR
> versions.
> Thanks.
> Alejandro
> On Thu, Feb 10, 2011 at 4:51 PM, Alejandro Abdelnur <tucu@cloudera.com>
> wrote:
>>
>> Hi Balaji,
>> The latest patch of the Hive action does not bundle hive-default.xml (got
>> same feedback from Carl), you'll be responsible for bundling it in the WF
>> directory until Hive JARs bundles it.
>> I'll upload the new patch early next week and then ask Oozie it integrate
>> it.
>> Still the problem I have is that, AFAIK, not all Hadoop and Hive JARs are
>> available in public Maven repositories currently used by Oozie build. I'll
>> submit as part o the PR a separate commit that configures Oozie build to
>> pull for Cloudera's Maven repositories where all JARs are available.
>> Thanks.
>> Alejandro
>> On Thu, Feb 10, 2011 at 4:34 PM, Balaji Rajagopalan
>> <balajirg@yahoo-inc.com> wrote:
>>>
>>> Alejandro,
>>>
>>>    I have used your hive action patch from tucu’s forked branch in yahoo
>>> github and it works fine, when will your patch be  available in the master
>>> branch of yahoo github.  Also I have a small suggestion if I may,
>>> hive-default.xml is bundled with the oozie-core.jar, instead can we have the
>>> hive-default.xml is the same folder of workflow.xml in the hdfs, so when I
>>> change the hive-default.xml I don’t have to bundle the jar again.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Balaji
>>>
>>>
>>>
>>> From: Alejandro Abdelnur [mailto:tucu@cloudera.com]
>>> Sent: Thursday, February 10, 2011 3:12 AM
>>> To: user@hive.apache.org
>>> Subject: Re: periodic execution
>>>
>>>
>>>
>>> Hi Cam,
>>>
>>>
>>>
>>> A bit of information that may be useful for you, Cloudera's Oozie has a
>>> Hive action that you can use from workflow jobs.
>>>
>>>
>>>
>>> Cheers
>>>
>>>
>>>
>>> Alejandro
>>>
>>>
>>>
>>> On Wed, Feb 9, 2011 at 11:44 AM, Cam Bazz <cambazz@gmail.com> wrote:
>>>
>>> Hello,
>>>
>>> I am looking over oozie's coordinator. But meanwhile, I managed to
>>> write a simple java program to connect to hive using jdbc.
>>>
>>> I can import data and execute queries.
>>>
>>> I was wondering, somewhat for doing workflows, one needs to keep
>>> metadata, i.e. which was the last file, partition processed etc.
>>>
>>> I could do this usually using a database like db4o, and keeping a static
>>> file.
>>>
>>> Is the derby database that comes with hive is for this purpose? how do
>>> people usually store state when using a hive application?
>>>
>>> best regards,
>>> -C.B.
>>>
>>> On Wed, Feb 9, 2011 at 5:23 AM, Jeff Hammerbacher <hammer@cloudera.com>
>>> wrote:
>>> > Hey Cam,
>>> > You should use Oozie's
>>> > Coordinator: https://github.com/yahoo/oozie/wiki/Oozie-Coord-Use-Cases.
>>> > Regards,
>>> > Jeff
>>> >
>>> > On Tue, Feb 8, 2011 at 4:29 PM, Cam Bazz <cambazz@gmail.com> wrote:
>>> >>
>>> >> Hello,
>>> >>
>>> >> What kind of strategy must i follow, in order to periodically run
>>> >> certain things.
>>> >>
>>> >> For example, each hour, i want to look up log files from certain dir,
>>> >> and for new files, i need to run:
>>> >>
>>> >> load data local inpath '/home/cam/logs/log.2011310120' into table
>>> >> item_view_raw partition (date_hour=2011310120);
>>> >>
>>> >> FROM item_view_raw ivr INSERT OVERWRITE TABLE item_view partition
>>> >> (date_hour=2011310120) SELECT ivr.view_time, ivr.ip_number,
>>> >> ivr.session_id, ivr.session_cookie, ivr.eser_sid, ivr.sale_status,
>>> >> ivr.maker_name, ivr.title WHERE ivr.log_tag = 'PROD' and
>>> >> ivr.date_hour='2011310120';
>>> >>
>>> >> obviously, i need to deduce which files are new, iterate over them,
>>> >> and extract the time key, which will be used as a partition name, in
>>> >> this case is: 2011310120
>>> >>
>>> >> It seems like i can write a java program to deal with the
>>> >> syncronization of all these tasks, but i was wondering, what would you
>>> >> guys suggest?
>>> >>
>>> >> Any ideas/recomendations/help greatly appreciated
>>> >>
>>> >> Best Regards,
>>> >> C.B.
>>> >
>>> >
>>>
>>>
>
>

Did support for hive variables (0.7.0) make it into this version of
the oozie-action?

Mime
View raw message