Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-dev@lucene.apache.org
Message-ID: <16555457.1199807434059.JavaMail.jira@brutus>
Date: Tue, 8 Jan 2008 07:50:34 -0800 (PST)
From: "Enis Soztutar (JIRA)" <jira@apache.org>
To: hadoop-dev@lucene.apache.org
Subject: [jira] Commented: (HADOOP-2206) Design/implement a general
 log-aggregation framework for Hadoop
In-Reply-To: <22875009.1195068823223.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HADOOP-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556918#action_12556918 ] 

Enis Soztutar commented on HADOOP-2206:
---------------------------------------

Jeff, any update on this? Maybe you can provide us an alpha release to get our hands dirty.  

> Design/implement a general log-aggregation framework for Hadoop
> ---------------------------------------------------------------
>
>                 Key: HADOOP-2206
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2206
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs, mapred
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.17.0
>
>
> I'd like to propose a log-aggregation framework which facilitates collection, aggregation and storage of the logs of the Hadoop Map-Reduce framework and user-jobs in HDFS. Clearly the design/implementation of this framework is heavily influenced and limited by Hadoop itself for e.g. lack of appends, not too many small files (think: stdout/stderr/syslog of each map/reduce task) and so on. 
> This framework will be especially useful once HoD (HADOOP-1301) is used to provision dynamic, per-user, Map-Reduce clusters.
> h4. Requirements:
> *  Store the various logs to a configurable location in the Hadoop Distributed FileSystem
> ** User task logs (stdout, stderr, syslog)
> ** Map-Reduce daemons' logs (JobTracker and TaskTracker)
> * Integrate well with Hadoop and ensure no adverse performance impact on the Map-Reduce framework.
> * It must not use a HDFS file (or more!) per a task, which would swamp the NameNode capabilities.
> * The aggregation system must be distributed and reliable.
> * Facilities/tools to read the aggregated logs.
> * The aggregated logs should be compressed.
> h4. Architecture:
> Here is a high-level overview of the log-aggregation framework:
> h5. Logging
> * Provision a cloud of log-aggregators in the cluster (outside of the Hadoop cluster, running on the subset of nodes in the cluster). Lets call each one in the cloud as a Log Aggregator i.e. LA.
> * Each LA writes out 2 files per Map-Reduce cluster: an index file and a data file. The LA maintains one directory per Map-Reduce cluster on HDFS.
> * The index file format is simple:
> ** streamid (_streamid_ is either daemon identifier e.g. tasktracker_foo.bar.com:57891 or $jobid-$taskid-(stdout|stderr|syslog) or individual task-logs)
> ** timestamp
> ** logs-data start offset
> ** no. of bytes
> * Each Hadoop daemon (JT/TT) is given the entire list of LAs in the cluster.
> * Each daemon picks one LA (at random) from the list, opens an exclusive stream with the LA after identifying itself (i.e. ${daemonid}) and sends it's logs. In case of error/failure to log it just connects to another LA as above and starts logging to it.
> * The logs are sent to the LA by a new log4j appender. The appender provides some amount of buffering on the client-side.
> * Implement a feature in the TaskTracker which lets it use the same appender to send out the userlogs (stdout/stderr/syslog) to the LA after task completion. This is important to ensure that logging to the LA at runtime doesn't hurt the task's performance (see HADOOP-1553). The TaskTracker picks an LA per task in a manner similar to the one it uses for it's own logs, identifies itself (<${jobid}, ${taskid}, {stdout|stderr|syslog}>) and streams the entire task-log at one go. In fact we can pick different LAs for each of the task's stdout, stderr and syslog logs - each an exclusive stream to a single LA.
> * The LA buffers some amount of data in memory (say 16K) and then flushes that data to the HDFS file (per LA per cluster) after writing out an entry to the index file.
> * The LA periodically purges old logs (monthly, fortnightly or weekly as today). 
> h5. Getting the logged information
> The main requirement is to implement a simple set of tools to query the LA (i.e. the index/data files on HDFS) to glean the logged information.
> If we can think of each Map-Reduce cluster's logs as a set of archives (i.e. one file per cluster per LA used) we need the ability to query the log-archive to figure out the available streams and the ability to get one entire stream or a subset of time based on timestamp-ranges. Essentially these are simple tools which parse the index files of each LA (for a given Hadoop cluster) and return the required information.
> h6. Query for available streams
> The query just returns all the available streams in an cluster-log archive identified by the HDFS path.
> It looks something like this for a cluster with 3 nodes which ran 2 jobs, first of which had 2 maps, 1 reduce and the second had 1 map, 1 reduce:
> {noformat}
>    $ la -query /log-aggregation/cluster-20071113
>    Available streams:
>    jobtracker_foo.bar.com:57893
>    tasktracker_baz.bar.com:57841
>    tasktracker_fii.bar.com:57891
>    job_20071113_0001-task_20071113_0001_m_000000_0-stdout
>    job_20071113_0001-task_20071113_0001_m_000000_0-stderr
>    job_20071113_0001-task_20071113_0001_m_000000_0-syslog
>    job_20071113_0001-task_20071113_0001_m_000001_0-stdout
>    job_20071113_0001-task_20071113_0001_m_000001_0-stderr
>    job_20071113_0001-task_20071113_0001_m_000001_0-syslog
>    job_20071113_0001-task_20071113_0001_r_000000_0-stdout
>    job_20071113_0001-task_20071113_0001_r_000000_0-stderr
>    job_20071113_0001-task_20071113_0001_r_000000_0-syslog
>    job_20071113_0001-task_20071113_0001_m_000000_0-stdout
>    job_20071113_0001-task_20071113_0002_m_000000_0-stderr
>    job_20071113_0001-task_20071113_0002_m_000000_0-syslog
>    job_20071113_0001-task_20071113_0002_m_000001_0-stdout
>    job_20071113_0001-task_20071113_0002_m_000001_0-stderr
>    job_20071113_0001-task_20071113_0002_m_000001_0-syslog
>    job_20071113_0001-task_20071113_0002_r_000000_0-stdout
>    job_20071113_0001-task_20071113_0002_r_000000_0-stderr
>    job_20071113_0001-task_20071113_0002_r_000000_0-syslog
> {noformat}
> h6. Get logged information per stream
> The framework also offers the ability to query and fetch the actual log-data, per-stream for a given timestamp-range. It looks something like:
> {noformat}
>     $ la -fetch -daemon jt -range <t1:t2> /log-aggregation/cluster-20071113
>     $ la -fetch -daemon tt1 /log-aggregation/cluster-20071113
>     $ la -fetch -jobid <jobid> -taskid <taskid> -log <out|err|sys> -range <t1:t2> /log-aggregation/cluster-20071113
> {noformat}
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.