hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <>
Subject Re: Hive Transform Scripts Ending Cleanly
Date Fri, 21 Sep 2012 15:26:40 GMT
There is a setting in hive site which allows transform scripts to
continue even if they take a long time to return a single row.


On Fri, Sep 21, 2012 at 8:55 AM, John Omernik <> wrote:
> Greetings All -
> I have a transform script that some some awesome stuff (at least to my eyes)
> Basically, here is the SQL
>   SELECT TRANSFORM (filename)
>   USING '' as (col1, col2, col3, col4, col5)
>   FROM mysource_filetable
> is actually a wrapper script that
> looks like this:
> #!/bin/bash
> while read line; do
>     filename=$line
>     python /mnt/node_scripts/ -i $filename -o STDOUT
> done
> The reason for handling calling the python script in a bash script is so I
> can read off stdin, process the data, and then shoot it off to standard OUT.
> There are some other reasons... but it works great, most of the time.
> Sometimes, for whatever reason, we have a situation where the hive
> "listener" )(I don't know what else to call it) gets bored listening for
> data. The python script can take a long time depending on the data being
> sent to it.  It gives up listening for STDOUT, the task times out, and the
> job retries that file somewhere else where it succeeds. No big deal.
> However, the python script and the java that's calling it seems to still be
> running using up resources. If it doesn't exit cleanly, it kinda wigs out
> and goes on to TRANSFORM THE WORLD (said in a loud echoing booming voice).
> Anywho, just curious if there are ways I can monitor for that. Perhaps check
> for things in my, maybe run python direct from hive? Settings in
> hive that will force kill the runaways?  Transform, and it's capabilities
> are AWESOME, but like much in hive, documentation is all over the place.

View raw message