hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Omernik <j...@omernik.com>
Subject Hive Transform Scripts Ending Cleanly
Date Fri, 21 Sep 2012 12:55:20 GMT
Greetings All -

I have a transform script that some some awesome stuff (at least to my eyes)

Basically, here is the SQL


  SELECT TRANSFORM (filename)
  USING 'worker.sh' as (col1, col2, col3, col4, col5)
  FROM mysource_filetable


worker.sh is actually a wrapper script that

looks like this:

#!/bin/bash

while read line; do
    filename=$line
    python /mnt/node_scripts/parser.py -i $filename -o STDOUT
done

The reason for handling calling the python script in a bash script is so I
can read off stdin, process the data, and then shoot it off to standard
OUT.  There are some other reasons... but it works great, most of the time.

Sometimes, for whatever reason, we have a situation where the hive
"listener" )(I don't know what else to call it) gets bored listening for
data. The python script can take a long time depending on the data being
sent to it.  It gives up listening for STDOUT, the task times out, and the
job retries that file somewhere else where it succeeds. No big deal.
However, the python script and the java that's calling it seems to still be
running using up resources. If it doesn't exit cleanly, it kinda wigs out
and goes on to TRANSFORM THE WORLD (said in a loud echoing booming voice).
 Anywho, just curious if there are ways I can monitor for that. Perhaps
check for things in my worker.sh, maybe run python direct from hive?
Settings in hive that will force kill the runaways?  Transform, and it's
capabilities are AWESOME, but like much in hive, documentation is all over
the place.

Mime
View raw message