Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4F118105F5 for ; Fri, 11 Oct 2013 17:01:20 +0000 (UTC) Received: (qmail 5304 invoked by uid 500); 11 Oct 2013 17:01:11 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 5241 invoked by uid 500); 11 Oct 2013 17:01:11 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 5233 invoked by uid 99); 11 Oct 2013 17:01:10 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Oct 2013 17:01:10 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sandy.ryza@cloudera.com designates 209.85.192.173 as permitted sender) Received: from [209.85.192.173] (HELO mail-pd0-f173.google.com) (209.85.192.173) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Oct 2013 17:01:05 +0000 Received: by mail-pd0-f173.google.com with SMTP id p10so4552989pdj.18 for ; Fri, 11 Oct 2013 10:00:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=MBkBoefpgBRRY4lyc5BAYylkZkQAii0tNRt7FGbonck=; b=FqUov8LNXvg2wpxpCT70pEA725qDH8p9B+riCtViCfGwKZBLcnsppuWSJLT0/p4t1R vvppgiH4J1eOKtVQHZmaF0qfYX5p1bc5HzwKX7g62nc+pEO8elMeGcaTf45slBSb4efb cDL3hWyLrMEFJKoW9avEB4+TMZBEfRK6Ktwg8TNF98hpFJdx+viHsfB1GrSH3BbJ7fcm LHfXTCg++GLEm7CdfyGn9zSv5KctUT0vC/Z4h532YWJHQsyVEkrjS8qYw2M2+iXMmmrb qtjZXzxFfW9JO84S2fFJcSrq6CmrhTq9bLNiylu4G69bEEt894+KZ5VH8tTobpTcQqOs F8lA== X-Gm-Message-State: ALoCoQld0pmTRz/hefS0NJ6eetK59CG14lhUy5cWAIKtIkSZjXAIk60nxmIHOdbNDdX2ScJIQO3A MIME-Version: 1.0 X-Received: by 10.67.30.70 with SMTP id kc6mr22256041pad.32.1381510843721; Fri, 11 Oct 2013 10:00:43 -0700 (PDT) Received: by 10.70.52.2 with HTTP; Fri, 11 Oct 2013 10:00:43 -0700 (PDT) In-Reply-To: References: Date: Fri, 11 Oct 2013 10:00:43 -0700 Message-ID: Subject: Re: State of Art in Hadoop Log aggregation From: Sandy Ryza To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11332d981803d504e87a0fd6 X-Virus-Checked: Checked by ClamAV on apache.org --001a11332d981803d504e87a0fd6 Content-Type: text/plain; charset=ISO-8859-1 Just a clarification: Cloudera Manager is now free for any number of nodes. Ref: http://www.cloudera.com/content/cloudera/en/products/cloudera-manager.html -Sandy On Fri, Oct 11, 2013 at 7:05 AM, DSuiter RDX wrote: > Sagar, > > It sounds like you want a management console. We are using Cloudera > Manager, but for 200 nodes you would need to license it, it is only free up > to 50 nodes. > > The FOSS version of this is Ambari, iirc. > http://incubator.apache.org/ambari/ > > Flume will provide a Hadoop-integrated pipeline for ingesting data. The > data will still need to be analyzed and visualized if you use Flume. Kafka > is a newer project for collecting and aggregating logs, but is a separate > project and will need a server of its own to manage. > > We use Splunk also, since it is approved by our auditing compliance agency. > > Thanks, > *Devin Suiter* > Jr. Data Solutions Software Engineer > 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212 > Google Voice: 412-256-8556 | www.rdx.com > > > On Fri, Oct 11, 2013 at 9:54 AM, Alexander Alten-Lorenz < > wget.null@gmail.com> wrote: > >> Hi, >> >> http://flume.apache.org >> >> - Alex >> >> On Oct 11, 2013, at 7:36 AM, Sagar Mehta wrote: >> >> Hi Guys, >> >> We have fairly decent sized Hadoop cluster of about 200 nodes and was >> wondering what is the state of art if I want to aggregate and visualize >> Hadoop ecosystem logs, particularly >> >> 1. Tasktracker logs >> 2. Datanode logs >> 3. Hbase RegionServer logs >> >> One way is to use something like a Flume on each node to aggregate the >> logs and then use something like Kibana - >> http://www.elasticsearch.org/overview/kibana/ to visualize the logs and >> make them searchable. >> >> However I don't want to write another ETL for the hadoop/hbase logs >> themselves. We currently log in to each machine individually to 'tail -F >> logs' when there is an hadoop problem on a particular node. >> >> We want a better way to look at the hadoop logs themselves in a >> centralized way when there is an issue without having to login to 100 >> different machines and was wondering what is the state of are in this >> regard. >> >> Suggestions/Pointers are very welcome!! >> >> Sagar >> >> >> -- >> Alexander Alten-Lorenz >> http://mapredit.blogspot.com >> German Hadoop LinkedIn Group: http://goo.gl/N8pCF >> >> > --001a11332d981803d504e87a0fd6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Just a clarification: Cloudera Manager is now free for any= number of nodes.

-Sandy


On Fri, Oct 11, 2013 at 7:05 AM, DSuit= er RDX <dsuiter@rdx.com> wrote:
Sagar,

I= t sounds like you want a management console. We are using Cloudera Manager,= but for 200 nodes you would need to license it, it is only free up to 50 n= odes.

The FOSS version of this is Ambari, iirc.=A0http://incubator.apache.org/ambari/<= /div>

Flume will provide a Hadoop-integrated pipeline fo= r ingesting data. The data will still need to be analyzed and visualized if= you use Flume. Kafka is a newer project for collecting and aggregating log= s, but is a separate project and will need a server of its own to manage.

We use Splunk also, since it is approved by our auditin= g compliance agency.

Thanks,
Devin Su= iter
Jr. Data Solutions Software Engineer
10= 0 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice:= 412-256-8556 |=A0www.rdx.com


On Fri, Oct 11, 2013 at 9:54 AM, Alexand= er Alten-Lorenz <wget.null@gmail.com> wrote:
Hi,




Hi Guys,

= We have fairly decent sized Hadoop cluster of about 200 nodes and was wonde= ring what is the state of art if I want to aggregate and visualize Hadoop e= cosystem logs, particularly
  1. Tasktracker logs
  2. Datanode logs
  3. Hbase R= egionServer logs
One way is to use something like a Flume= on each node to aggregate the logs and then use something like Kibana -=A0= http://www.elasticsearch.org/overview/kibana/ to visualize the logs an= d make them searchable.

However I don't want to write another ETL for= the hadoop/hbase logs =A0themselves. We currently log in to each machine i= ndividually to 'tail -F logs' when there is an hadoop problem on a = particular node.

We want a better way to look at the hadoop logs themsel= ves in a centralized way when there is an issue without having to login to = 100 different machines and was wondering what is the state of are in this r= egard.

Suggestions/Pointers are very welcome!!

<= /div>
Sagar

--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn = Group:=A0http://goo.gl/N8= pCF



--001a11332d981803d504e87a0fd6--