Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D59251008C for ; Fri, 11 Oct 2013 14:06:07 +0000 (UTC) Received: (qmail 35831 invoked by uid 500); 11 Oct 2013 14:05:58 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 35744 invoked by uid 500); 11 Oct 2013 14:05:57 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 35732 invoked by uid 99); 11 Oct 2013 14:05:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Oct 2013 14:05:56 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dsuiter@rdx.com designates 74.125.82.178 as permitted sender) Received: from [74.125.82.178] (HELO mail-we0-f178.google.com) (74.125.82.178) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Oct 2013 14:05:51 +0000 Received: by mail-we0-f178.google.com with SMTP id q59so4156174wes.23 for ; Fri, 11 Oct 2013 07:05:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rdx.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=4ATqD6z7r/rg4e03sWrxIBHuUAOJId2DW49zX5OJmD4=; b=Ye1CweMhb+IdJaXA/zApts1ji0pyXvZsutAZW870lM18OUJFzgOfmMNI2llx6veAJx 6QnKPoua4V7nelrjLpLb2H1jx94HilkcDsr4DCXF0dp1pu4PBbr58hhBX6i9SwdE9QuO uiavJzBjYdWNA82k6W3EbmdhUv+R/bjOADvXs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=4ATqD6z7r/rg4e03sWrxIBHuUAOJId2DW49zX5OJmD4=; b=kq8PBYc2R1Ofjg4gFPbnsE/kEiEvEhASjZCtaHz6GRLwoSD4uw6JCclRWUHylEQAAj yjhA/qI3nuL9M2n0DtIF6pbs5nlOEQ1IfmkcmpgKe9RNlEClw6sMLqx/fNycU9oq4FtK 3NxpEdYBMfojnBrwUsu4nf7DH4MzgiG0RgHx1HONh2WHS6weqB3VN8ABPhGaT4MGwB9c Tuf1eOmSZ/Yi/YlHb8GmQGWzrMOGEhTIFy1jjrGOXlphohFhrlJMMlgcnCTnnTHdjGmS KX+aGgcRGABwM9Z99g5LmC0a2HM1oax93q7ViOfzQXPeknra5pHjI61cfqCI5vc6IAZn Ge9Q== X-Gm-Message-State: ALoCoQk4TMWYhfaScBKOGvC8RGUAmV87nqSk05y7gy//723LhqWvBRz9NNQW8Hnuk9swhyH0aXj2 MIME-Version: 1.0 X-Received: by 10.180.187.51 with SMTP id fp19mr3405485wic.1.1381500330698; Fri, 11 Oct 2013 07:05:30 -0700 (PDT) Received: by 10.216.52.134 with HTTP; Fri, 11 Oct 2013 07:05:30 -0700 (PDT) In-Reply-To: References: Date: Fri, 11 Oct 2013 10:05:30 -0400 Message-ID: Subject: Re: State of Art in Hadoop Log aggregation From: DSuiter RDX To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11c37ba27800f204e8779c06 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c37ba27800f204e8779c06 Content-Type: text/plain; charset=ISO-8859-1 Sagar, It sounds like you want a management console. We are using Cloudera Manager, but for 200 nodes you would need to license it, it is only free up to 50 nodes. The FOSS version of this is Ambari, iirc. http://incubator.apache.org/ambari/ Flume will provide a Hadoop-integrated pipeline for ingesting data. The data will still need to be analyzed and visualized if you use Flume. Kafka is a newer project for collecting and aggregating logs, but is a separate project and will need a server of its own to manage. We use Splunk also, since it is approved by our auditing compliance agency. Thanks, *Devin Suiter* Jr. Data Solutions Software Engineer 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212 Google Voice: 412-256-8556 | www.rdx.com On Fri, Oct 11, 2013 at 9:54 AM, Alexander Alten-Lorenz wrote: > Hi, > > http://flume.apache.org > > - Alex > > On Oct 11, 2013, at 7:36 AM, Sagar Mehta wrote: > > Hi Guys, > > We have fairly decent sized Hadoop cluster of about 200 nodes and was > wondering what is the state of art if I want to aggregate and visualize > Hadoop ecosystem logs, particularly > > 1. Tasktracker logs > 2. Datanode logs > 3. Hbase RegionServer logs > > One way is to use something like a Flume on each node to aggregate the > logs and then use something like Kibana - > http://www.elasticsearch.org/overview/kibana/ to visualize the logs and > make them searchable. > > However I don't want to write another ETL for the hadoop/hbase logs > themselves. We currently log in to each machine individually to 'tail -F > logs' when there is an hadoop problem on a particular node. > > We want a better way to look at the hadoop logs themselves in a > centralized way when there is an issue without having to login to 100 > different machines and was wondering what is the state of are in this > regard. > > Suggestions/Pointers are very welcome!! > > Sagar > > > -- > Alexander Alten-Lorenz > http://mapredit.blogspot.com > German Hadoop LinkedIn Group: http://goo.gl/N8pCF > > --001a11c37ba27800f204e8779c06 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Sagar,

It sounds like you want a manage= ment console. We are using Cloudera Manager, but for 200 nodes you would ne= ed to license it, it is only free up to 50 nodes.

The FOSS version of this is Ambari, iirc.=A0http://incubator.apache.org/ambari/

Flume will provide a Hadoop-integrated pipeline for ingesting data. = The data will still need to be analyzed and visualized if you use Flume. Ka= fka is a newer project for collecting and aggregating logs, but is a separa= te project and will need a server of its own to manage.

We use Splunk also, since it is approved by our auditin= g compliance agency.

Thanks,
Devin Su= iter
Jr. Data Solutions Software Engineer
100 Sandusky St= reet | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-= 8556 |=A0ww= w.rdx.com


On Fri, Oct 11, 2013 at 9:54 AM, Alexand= er Alten-Lorenz <wget.null@gmail.com> wrote:
Hi,




Hi Guys,

= We have fairly decent sized Hadoop cluster of about 200 nodes and was wonde= ring what is the state of art if I want to aggregate and visualize Hadoop e= cosystem logs, particularly
  1. Tasktracker logs
  2. Datanode logs
  3. Hbase R= egionServer logs
One way is to use something like a Flume= on each node to aggregate the logs and then use something like Kibana -=A0= http://www.elasticsearch.org/overview/kibana/ to visualize the logs an= d make them searchable.

However I don't want to write another ETL for= the hadoop/hbase logs =A0themselves. We currently log in to each machine i= ndividually to 'tail -F logs' when there is an hadoop problem on a = particular node.

We want a better way to look at the hadoop logs themsel= ves in a centralized way when there is an issue without having to login to = 100 different machines and was wondering what is the state of are in this r= egard.

Suggestions/Pointers are very welcome!!

<= /div>
Sagar

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn = Group:=A0http://goo.gl/N8= pCF


--001a11c37ba27800f204e8779c06--