hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mahadev konar (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-489) Seperating user logs from system logs in map reduce
Date Wed, 06 Sep 2006 00:58:24 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-489?page=comments#action_12432702 ] 
Mahadev konar commented on HADOOP-489:

Here's my  take on this issue.

1) Put all the user logs in 
tasktracker.log.dir/job_id.log. So there would be just one jobfile , per job on a local machine.
The logs might be out of order with 2 or more task writing to the log files simultaneously,
but I do not think that is a big issue. Having log files per task seems like an overkill with
too many log files on local machines.

2) the user is shown all the fatal errors while executing his job on his screen.

3) The jobclient polls one of each failed, succeeded tasks to get the user logs and shows
it on the screen for the user. 
  i) Why just one of each?
    So that we do not clutter the screen with all the user logs.
 ii) Why only succeeded/failed?
    This is based on providiing the user a sample of each of his tasks so that in case all
of his tasks fail which might be due to his code, he gets an idea of why it failed at runtime.
Though, this is definitely not really runtime since the tasks would have either failed or
succeeded by the time he gets the logs., the user still can kill his job in case something
is horribly wrong with his/her job.
  iii) This would entail running a servlet on each of the tasktrackers to serve up the user
logs to the jobclient.

4) We can provide an extra option to the user if he wants these log files into the DFS.  This
would allow tthe user to go through the logs as and when he wishes. This could be done by
providing a jobconf variable, so that after each job is executed the logs are autoimatically
transferred to DFS or as an command line option (which would mean creating a new protocol
from jobclient -> jobtracker -> tasktrackers to put all the given jobid files into DFS).

5) These job files could be deleted on a time basis (after every 48 hrs?)

> Seperating user logs from system logs in map reduce
> ---------------------------------------------------
>                 Key: HADOOP-489
>                 URL: http://issues.apache.org/jira/browse/HADOOP-489
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Mahadev konar
>         Assigned To: Mahadev konar
>            Priority: Minor
> Currently the user logs are a part of system logs in mapreduce. Anything logged by the
user is logged into the tasktracker log files. This create two issues-
> 1) The system log files get cluttered with user output. If the user outputs a large amount
of logs, the system logs need to be cleaned up pretty often.
> 2) For the user, it is difficult to get to each of the machines and look for the logs
his/her job might have generated.
> I am proposing three solutions to the problem. All of them have issues with it -
> Solution 1.
> Output the user logs on the user screen as part of the job submission process. 
> Merits- 
> This will prevent users from printing large amount of logs and the user can get runtime
feedback on what is wrong with his/her job.
> Issues - 
> This proposal will use the framework bandwidth while running jobs for the user. The user
logs will need to pass from the tasks to the tasktrackers, from the tasktrackers to the jobtrackers
and then from the jobtrackers to the jobclient using a lot of framework bandwidth if the user
is printing out too much data.
> Solution 2.
> Output the user logs onto a dfs directory and then concatenate these files. Each task
can create a file for the output in the log direcotyr for a given user and jobid.
> Issues -
> This will create a huge amount of small files in DFS which later can be concatenated
into a single file. Also there is this issue that who would concatenate these files into a
single file? This could be done by the framework (jobtracker) as part of the cleanup for the
jobs - might stress the jobtracker.
> Solution 3.
> Put the user logs into a seperate user log file in the log directory on each tasktrackers.
We can provide some tools to query these local log files. We could have commands like for
jobid j and for taskid t get me the user log output. These tools could run as a seperate map
reduce program with each map grepping the user log files and a single recude aggregating these
logs in to a single dfs file.
> Issues-
> This does sound like more work for the user. Also, the output might not be complete since
a tasktracker might have went down after it ran the job. 
> Any thoughts?

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message