hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cam Bazz <>
Subject Re: periodic execution
Date Wed, 09 Feb 2011 03:44:58 GMT

I am looking over oozie's coordinator. But meanwhile, I managed to
write a simple java program to connect to hive using jdbc.

I can import data and execute queries.

I was wondering, somewhat for doing workflows, one needs to keep
metadata, i.e. which was the last file, partition processed etc.

I could do this usually using a database like db4o, and keeping a static file.

Is the derby database that comes with hive is for this purpose? how do
people usually store state when using a hive application?

best regards,

On Wed, Feb 9, 2011 at 5:23 AM, Jeff Hammerbacher <> wrote:
> Hey Cam,
> You should use Oozie's
> Coordinator:
> Regards,
> Jeff
> On Tue, Feb 8, 2011 at 4:29 PM, Cam Bazz <> wrote:
>> Hello,
>> What kind of strategy must i follow, in order to periodically run
>> certain things.
>> For example, each hour, i want to look up log files from certain dir,
>> and for new files, i need to run:
>> load data local inpath '/home/cam/logs/log.2011310120' into table
>> item_view_raw partition (date_hour=2011310120);
>> FROM item_view_raw ivr INSERT OVERWRITE TABLE item_view partition
>> (date_hour=2011310120) SELECT ivr.view_time, ivr.ip_number,
>> ivr.session_id, ivr.session_cookie, ivr.eser_sid, ivr.sale_status,
>> ivr.maker_name, ivr.title WHERE ivr.log_tag = 'PROD' and
>> ivr.date_hour='2011310120';
>> obviously, i need to deduce which files are new, iterate over them,
>> and extract the time key, which will be used as a partition name, in
>> this case is: 2011310120
>> It seems like i can write a java program to deal with the
>> syncronization of all these tasks, but i was wondering, what would you
>> guys suggest?
>> Any ideas/recomendations/help greatly appreciated
>> Best Regards,
>> C.B.

View raw message