hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Reed <br...@yahoo-inc.com>
Subject Re: [jira] Created: (HADOOP-489) Seperating user logs from system logs in map reduce
Date Tue, 29 Aug 2006 18:53:16 GMT
I like Solution 3 as long as there was an API to query the logs.


On Tuesday 29 August 2006 11:33, Mahadev konar (JIRA) wrote:
> Seperating user logs from system logs in map reduce
> ---------------------------------------------------
>                  Key: HADOOP-489
>                  URL: http://issues.apache.org/jira/browse/HADOOP-489
>              Project: Hadoop
>           Issue Type: Improvement
>           Components: mapred
>             Reporter: Mahadev konar
>          Assigned To: Mahadev konar
>             Priority: Minor
> Currently the user logs are a part of system logs in mapreduce. Anything
> logged by the user is logged into the tasktracker log files. This create
> two issues- 1) The system log files get cluttered with user output. If the
> user outputs a large amount of logs, the system logs need to be cleaned up
> pretty often. 2) For the user, it is difficult to get to each of the
> machines and look for the logs his/her job might have generated.
> I am proposing three solutions to the problem. All of them have issues with
> it -
> Solution 1.
> Output the user logs on the user screen as part of the job submission
> process.
> Merits-
> This will prevent users from printing large amount of logs and the user can
> get runtime feedback on what is wrong with his/her job.
> Issues -
> This proposal will use the framework bandwidth while running jobs for the
> user. The user logs will need to pass from the tasks to the tasktrackers,
> from the tasktrackers to the jobtrackers and then from the jobtrackers to
> the jobclient using a lot of framework bandwidth if the user is printing
> out too much data.
> Solution 2.
> Output the user logs onto a dfs directory and then concatenate these files.
> Each task can create a file for the output in the log direcotyr for a given
> user and jobid.
> Issues -
> This will create a huge amount of small files in DFS which later can be
> concatenated into a single file. Also there is this issue that who would
> concatenate these files into a single file? This could be done by the
> framework (jobtracker) as part of the cleanup for the jobs - might stress
> the jobtracker.
> Solution 3.
> Put the user logs into a seperate user log file in the log directory on
> each tasktrackers. We can provide some tools to query these local log
> files. We could have commands like for jobid j and for taskid t get me the
> user log output. These tools could run as a seperate map reduce program
> with each map grepping the user log files and a single recude aggregating
> these logs in to a single dfs file.
> Issues-
> This does sound like more work for the user. Also, the output might not be
> complete since a tasktracker might have went down after it ran the job.
> Any thoughts?

View raw message