hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Hammerbacher <ham...@cloudera.com>
Subject Re: periodic execution
Date Wed, 09 Feb 2011 03:23:44 GMT
Hey Cam,

You should use Oozie's Coordinator:
https://github.com/yahoo/oozie/wiki/Oozie-Coord-Use-Cases.

Regards,
Jeff

On Tue, Feb 8, 2011 at 4:29 PM, Cam Bazz <cambazz@gmail.com> wrote:

> Hello,
>
> What kind of strategy must i follow, in order to periodically run
> certain things.
>
> For example, each hour, i want to look up log files from certain dir,
> and for new files, i need to run:
>
> load data local inpath '/home/cam/logs/log.2011310120' into table
> item_view_raw partition (date_hour=2011310120);
>
> FROM item_view_raw ivr INSERT OVERWRITE TABLE item_view partition
> (date_hour=2011310120) SELECT ivr.view_time, ivr.ip_number,
> ivr.session_id, ivr.session_cookie, ivr.eser_sid, ivr.sale_status,
> ivr.maker_name, ivr.title WHERE ivr.log_tag = 'PROD' and
> ivr.date_hour='2011310120';
>
> obviously, i need to deduce which files are new, iterate over them,
> and extract the time key, which will be used as a partition name, in
> this case is: 2011310120
>
> It seems like i can write a java program to deal with the
> syncronization of all these tasks, but i was wondering, what would you
> guys suggest?
>
> Any ideas/recomendations/help greatly appreciated
>
> Best Regards,
> C.B.
>

Mime
View raw message