hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hemanth Yamijala <yhema...@yahoo-inc.com>
Subject Re: HOD question wrt to the virtual cluster log files - where do they end up when the job ends
Date Thu, 28 Feb 2008 04:17:38 GMT
Jason Venner wrote:
> As you have all read from my previous emails, we are still pretty low 
> on the HOD learning curve.
>
That is explained. It is new software, so we will improve over time with 
feedback from our users, like you :-)
> We are having jobs that terminate and the virtual mapred cluster is 
> terminated also. We want access to the log files and job history from 
> the virtual cluster.
You've mentioned that you have a persistent DFS. So, you could do the 
following:

Set up the following variables in your hodrc:
[hodring]
log-destination-uri  = 
hdfs://<namenode-hostname>:<namenode-rpc-port>/user/hod/logs

If you are enabling permissions, you may need to make sure this 
directory is writable by all users who launch hod jobs. When a job is 
completed, hadoop logs will be uploaded to this folder in dfs as zipped 
files. You can access them from there. Hod logs are stored on the local 
file system under the log-dir configured in the [ringmaster] and 
[hodring] sections.

>
> Our other active research project is why are the virtual clusters 
> getting killed.
You mean you allocate, allocation succeeds, and after a while the 
cluster is no longer active ? If yes, please check the following:
- What's the default wallclock time set up in your torque master ? For e.g.:
$ qmgr -c "p q batch"
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
<snip>
set queue batch resources_default.walltime = 10000:00:00
</snip>

If this timelimit is low, you could be hitting that and the torque 
server is killing the cluster. You can increase the wallclock time when 
you allocate a cluster. Use --hod.walltime <time in seconds> in the 
allocate command line:
hod --hod.walltime 3600 -o "allocate .."

- Another possibility for auto-deallocation is if your cluster is not 
running no hadoop jobs for a long time. To free up nodes, HOD 
deallocates automatically.

>
> Anyway, how do we get access to the log files and job history from a 
> terminated virtual cluster.
>
History logs also should be uploaded to DFS as explained above.

> Note: we run a persistent DFS, and allocate virtual mapred clusters.
>
> Thanks


Mime
View raw message