Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of sagarmehta@gmail.com designates
 74.125.82.181 as permitted sender)
MIME-Version: 1.0
Date: Thu, 10 Oct 2013 22:36:24 -0700
Message-ID: 
 <CAMq4vAEbcKFH7s47pzvDzUFAbFLVJfhbMZB9XfSh+XL9Kwg_hA@mail.gmail.com>
Subject: State of Art in Hadoop Log aggregation
From: Sagar Mehta <sagarmehta@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=f46d0438eb6dcb2da304e8707f19

--f46d0438eb6dcb2da304e8707f19
Content-Type: text/plain; charset=ISO-8859-1

Hi Guys,

We have fairly decent sized Hadoop cluster of about 200 nodes and was
wondering what is the state of art if I want to aggregate and visualize
Hadoop ecosystem logs, particularly

   1. Tasktracker logs
   2. Datanode logs
   3. Hbase RegionServer logs

One way is to use something like a Flume on each node to aggregate the logs
and then use something like Kibana -
http://www.elasticsearch.org/overview/kibana/ to visualize the logs and
make them searchable.

However I don't want to write another ETL for the hadoop/hbase logs
 themselves. We currently log in to each machine individually to 'tail -F
logs' when there is an hadoop problem on a particular node.

We want a better way to look at the hadoop logs themselves in a centralized
way when there is an issue without having to login to 100 different
machines and was wondering what is the state of are in this regard.

Suggestions/Pointers are very welcome!!

Sagar

--f46d0438eb6dcb2da304e8707f19
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Guys,<div><br></div><div>We have fairly decent sized Ha=
doop cluster of about 200 nodes and was wondering what is the state of art =
if I want to aggregate and visualize Hadoop ecosystem logs, particularly</d=
iv>
<div><ol><li>Tasktracker logs<br></li><li>Datanode logs<br></li><li>Hbase R=
egionServer logs<br></li></ol><div>One way is to use something like a Flume=
 on each node to aggregate the logs and then use something like Kibana -=A0=
<a href=3D"http://www.elasticsearch.org/overview/kibana/">http://www.elasti=
csearch.org/overview/kibana/</a> to visualize the logs and make them search=
able.</div>
</div><div><br></div><div>However I don&#39;t want to write another ETL for=
 the hadoop/hbase logs =A0themselves. We currently log in to each machine i=
ndividually to &#39;tail -F logs&#39; when there is an hadoop problem on a =
particular node.</div>
<div><br></div><div>We want a better way to look at the hadoop logs themsel=
ves in a centralized way when there is an issue without having to login to =
100 different machines and was wondering what is the state of are in this r=
egard.</div>
<div><br></div><div>Suggestions/Pointers are very welcome!!</div><div><br><=
/div><div>Sagar</div></div>

--f46d0438eb6dcb2da304e8707f19--