hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alejandro Abdelnur <t...@cloudera.com>
Subject Re: periodic execution
Date Thu, 10 Feb 2011 08:51:55 GMT
Hi Balaji,

The latest patch of the Hive action does not bundle hive-default.xml (got
same feedback from Carl), you'll be responsible for bundling it in the WF
directory until Hive JARs bundles it.

I'll upload the new patch early next week and then ask Oozie it integrate
it.

Still the problem I have is that, AFAIK, not all Hadoop and Hive JARs are
available in public Maven repositories currently used by Oozie build. I'll
submit as part o the PR a separate commit that configures Oozie build to
pull for Cloudera's Maven repositories where all JARs are available.

Thanks.

Alejandro

On Thu, Feb 10, 2011 at 4:34 PM, Balaji Rajagopalan
<balajirg@yahoo-inc.com>wrote:

> Alejandro,
>
>    I have used your hive action patch from tucu’s forked branch in yahoo
> github and it works fine, when will your patch be  available in the master
> branch of yahoo github.  Also I have a small suggestion if I may,
> hive-default.xml is bundled with the oozie-core.jar, instead can we have the
> hive-default.xml is the same folder of workflow.xml in the hdfs, so when I
> change the hive-default.xml I don’t have to bundle the jar again.
>
>
>
> Regards,
>
> Balaji
>
>
>
> *From:* Alejandro Abdelnur [mailto:tucu@cloudera.com]
> *Sent:* Thursday, February 10, 2011 3:12 AM
> *To:* user@hive.apache.org
> *Subject:* Re: periodic execution
>
>
>
> Hi Cam,
>
>
>
> A bit of information that may be useful for you, Cloudera's Oozie has a
> Hive action that you can use from workflow jobs.
>
>
>
> Cheers
>
>
>
> Alejandro
>
>
>
> On Wed, Feb 9, 2011 at 11:44 AM, Cam Bazz <cambazz@gmail.com> wrote:
>
> Hello,
>
> I am looking over oozie's coordinator. But meanwhile, I managed to
> write a simple java program to connect to hive using jdbc.
>
> I can import data and execute queries.
>
> I was wondering, somewhat for doing workflows, one needs to keep
> metadata, i.e. which was the last file, partition processed etc.
>
> I could do this usually using a database like db4o, and keeping a static
> file.
>
> Is the derby database that comes with hive is for this purpose? how do
> people usually store state when using a hive application?
>
> best regards,
> -C.B.
>
>
> On Wed, Feb 9, 2011 at 5:23 AM, Jeff Hammerbacher <hammer@cloudera.com>
> wrote:
> > Hey Cam,
> > You should use Oozie's
> > Coordinator: https://github.com/yahoo/oozie/wiki/Oozie-Coord-Use-Cases.
> > Regards,
> > Jeff
> >
> > On Tue, Feb 8, 2011 at 4:29 PM, Cam Bazz <cambazz@gmail.com> wrote:
> >>
> >> Hello,
> >>
> >> What kind of strategy must i follow, in order to periodically run
> >> certain things.
> >>
> >> For example, each hour, i want to look up log files from certain dir,
> >> and for new files, i need to run:
> >>
> >> load data local inpath '/home/cam/logs/log.2011310120' into table
> >> item_view_raw partition (date_hour=2011310120);
> >>
> >> FROM item_view_raw ivr INSERT OVERWRITE TABLE item_view partition
> >> (date_hour=2011310120) SELECT ivr.view_time, ivr.ip_number,
> >> ivr.session_id, ivr.session_cookie, ivr.eser_sid, ivr.sale_status,
> >> ivr.maker_name, ivr.title WHERE ivr.log_tag = 'PROD' and
> >> ivr.date_hour='2011310120';
> >>
> >> obviously, i need to deduce which files are new, iterate over them,
> >> and extract the time key, which will be used as a partition name, in
> >> this case is: 2011310120
> >>
> >> It seems like i can write a java program to deal with the
> >> syncronization of all these tasks, but i was wondering, what would you
> >> guys suggest?
> >>
> >> Any ideas/recomendations/help greatly appreciated
> >>
> >> Best Regards,
> >> C.B.
> >
> >
>
>
>

Mime
View raw message