hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alejandro Abdelnur <t...@cloudera.com>
Subject Re: periodic execution
Date Wed, 09 Feb 2011 21:41:48 GMT
Hi Cam,

A bit of information that may be useful for you, Cloudera's Oozie has a Hive
action that you can use from workflow jobs.

Cheers

Alejandro

On Wed, Feb 9, 2011 at 11:44 AM, Cam Bazz <cambazz@gmail.com> wrote:

> Hello,
>
> I am looking over oozie's coordinator. But meanwhile, I managed to
> write a simple java program to connect to hive using jdbc.
>
> I can import data and execute queries.
>
> I was wondering, somewhat for doing workflows, one needs to keep
> metadata, i.e. which was the last file, partition processed etc.
>
> I could do this usually using a database like db4o, and keeping a static
> file.
>
> Is the derby database that comes with hive is for this purpose? how do
> people usually store state when using a hive application?
>
> best regards,
> -C.B.
>
> On Wed, Feb 9, 2011 at 5:23 AM, Jeff Hammerbacher <hammer@cloudera.com>
> wrote:
> > Hey Cam,
> > You should use Oozie's
> > Coordinator: https://github.com/yahoo/oozie/wiki/Oozie-Coord-Use-Cases.
> > Regards,
> > Jeff
> >
> > On Tue, Feb 8, 2011 at 4:29 PM, Cam Bazz <cambazz@gmail.com> wrote:
> >>
> >> Hello,
> >>
> >> What kind of strategy must i follow, in order to periodically run
> >> certain things.
> >>
> >> For example, each hour, i want to look up log files from certain dir,
> >> and for new files, i need to run:
> >>
> >> load data local inpath '/home/cam/logs/log.2011310120' into table
> >> item_view_raw partition (date_hour=2011310120);
> >>
> >> FROM item_view_raw ivr INSERT OVERWRITE TABLE item_view partition
> >> (date_hour=2011310120) SELECT ivr.view_time, ivr.ip_number,
> >> ivr.session_id, ivr.session_cookie, ivr.eser_sid, ivr.sale_status,
> >> ivr.maker_name, ivr.title WHERE ivr.log_tag = 'PROD' and
> >> ivr.date_hour='2011310120';
> >>
> >> obviously, i need to deduce which files are new, iterate over them,
> >> and extract the time key, which will be used as a partition name, in
> >> this case is: 2011310120
> >>
> >> It seems like i can write a java program to deal with the
> >> syncronization of all these tasks, but i was wondering, what would you
> >> guys suggest?
> >>
> >> Any ideas/recomendations/help greatly appreciated
> >>
> >> Best Regards,
> >> C.B.
> >
> >
>

Mime
View raw message