Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of sandy.ryza@cloudera.com
 designates 209.85.192.173 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAE_UNJWJE8WYULnSHV=5ZYuuhhezD+4jy6HYwZdiVY-jr1FBOg@mail.gmail.com>
References: 
 <CAMq4vAEbcKFH7s47pzvDzUFAbFLVJfhbMZB9XfSh+XL9Kwg_hA@mail.gmail.com>
	<F4C6B533-5307-49CE-930E-948DA251C235@gmail.com>
	<CAE_UNJWJE8WYULnSHV=5ZYuuhhezD+4jy6HYwZdiVY-jr1FBOg@mail.gmail.com>
Date: Fri, 11 Oct 2013 10:00:43 -0700
Message-ID: 
 <CACBYxK+AfQOOkR9PoCfFC7SZmm-SXqhLwHb_jze+0shDRi47og@mail.gmail.com>
Subject: Re: State of Art in Hadoop Log aggregation
From: Sandy Ryza <sandy.ryza@cloudera.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=001a11332d981803d504e87a0fd6

--001a11332d981803d504e87a0fd6
Content-Type: text/plain; charset=ISO-8859-1

Just a clarification: Cloudera Manager is now free for any number of nodes.
Ref:
http://www.cloudera.com/content/cloudera/en/products/cloudera-manager.html

-Sandy


On Fri, Oct 11, 2013 at 7:05 AM, DSuiter RDX <dsuiter@rdx.com> wrote:

> Sagar,
>
> It sounds like you want a management console. We are using Cloudera
> Manager, but for 200 nodes you would need to license it, it is only free up
> to 50 nodes.
>
> The FOSS version of this is Ambari, iirc.
> http://incubator.apache.org/ambari/
>
> Flume will provide a Hadoop-integrated pipeline for ingesting data. The
> data will still need to be analyzed and visualized if you use Flume. Kafka
> is a newer project for collecting and aggregating logs, but is a separate
> project and will need a server of its own to manage.
>
> We use Splunk also, since it is approved by our auditing compliance agency.
>
> Thanks,
> *Devin Suiter*
> Jr. Data Solutions Software Engineer
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com
>
>
> On Fri, Oct 11, 2013 at 9:54 AM, Alexander Alten-Lorenz <
> wget.null@gmail.com> wrote:
>
>> Hi,
>>
>> http://flume.apache.org
>>
>> - Alex
>>
>> On Oct 11, 2013, at 7:36 AM, Sagar Mehta <sagarmehta@gmail.com> wrote:
>>
>> Hi Guys,
>>
>> We have fairly decent sized Hadoop cluster of about 200 nodes and was
>> wondering what is the state of art if I want to aggregate and visualize
>> Hadoop ecosystem logs, particularly
>>
>>    1. Tasktracker logs
>>    2. Datanode logs
>>    3. Hbase RegionServer logs
>>
>> One way is to use something like a Flume on each node to aggregate the
>> logs and then use something like Kibana -
>> http://www.elasticsearch.org/overview/kibana/ to visualize the logs and
>> make them searchable.
>>
>> However I don't want to write another ETL for the hadoop/hbase logs
>>  themselves. We currently log in to each machine individually to 'tail -F
>> logs' when there is an hadoop problem on a particular node.
>>
>> We want a better way to look at the hadoop logs themselves in a
>> centralized way when there is an issue without having to login to 100
>> different machines and was wondering what is the state of are in this
>> regard.
>>
>> Suggestions/Pointers are very welcome!!
>>
>> Sagar
>>
>>
>> --
>> Alexander Alten-Lorenz
>> http://mapredit.blogspot.com
>> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>>
>>
>

--001a11332d981803d504e87a0fd6
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Just a clarification: Cloudera Manager is now free for any=
 number of nodes.<div style>Ref:=A0<a href=3D"http://www.cloudera.com/conte=
nt/cloudera/en/products/cloudera-manager.html">http://www.cloudera.com/cont=
ent/cloudera/en/products/cloudera-manager.html</a></div>
<div style><br></div><div style>-Sandy</div></div><div class=3D"gmail_extra=
"><br><br><div class=3D"gmail_quote">On Fri, Oct 11, 2013 at 7:05 AM, DSuit=
er RDX <span dir=3D"ltr">&lt;<a href=3D"mailto:dsuiter@rdx.com" target=3D"_=
blank">dsuiter@rdx.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Sagar,<div><br></div><div>I=
t sounds like you want a management console. We are using Cloudera Manager,=
 but for 200 nodes you would need to license it, it is only free up to 50 n=
odes.</div>
<div><br></div><div>
The FOSS version of this is Ambari, iirc.=A0<a href=3D"http://incubator.apa=
che.org/ambari/" target=3D"_blank">http://incubator.apache.org/ambari/</a><=
/div><div><br></div><div>Flume will provide a Hadoop-integrated pipeline fo=
r ingesting data. The data will still need to be analyzed and visualized if=
 you use Flume. Kafka is a newer project for collecting and aggregating log=
s, but is a separate project and will need a server of its own to manage.</=
div>

<div><br></div><div>We use Splunk also, since it is approved by our auditin=
g compliance agency.</div><div class=3D"gmail_extra"><br></div><div class=
=3D"gmail_extra">Thanks,<br clear=3D"all"><div><div dir=3D"ltr"><b>Devin Su=
iter</b><div>

<div>Jr. Data Solutions Software Engineer</div><div><div><img></div><div>10=
0 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212<br><span>Google Voice:=
 <span title=3D"Call with Google Voice"><a href=3D"tel:412-256-8556" value=
=3D"+14122568556" target=3D"_blank">412-256-8556</a></span> |=A0</span><a h=
ref=3D"http://www.rdx.com/" target=3D"_blank">www.rdx.com</a></div>

</div></div></div></div><div><div class=3D"h5">
<br><br><div class=3D"gmail_quote">On Fri, Oct 11, 2013 at 9:54 AM, Alexand=
er Alten-Lorenz <span dir=3D"ltr">&lt;<a href=3D"mailto:wget.null@gmail.com=
" target=3D"_blank">wget.null@gmail.com</a>&gt;</span> wrote:<br><blockquot=
e class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc sol=
id;padding-left:1ex">

<div style=3D"word-wrap:break-word"><div>Hi,</div><div><br></div><div><a hr=
ef=3D"http://flume.apache.org" target=3D"_blank">http://flume.apache.org</a=
></div><div><br></div><div>- Alex</div><br><div><div>On Oct 11, 2013, at 7:=
36 AM, Sagar Mehta &lt;<a href=3D"mailto:sagarmehta@gmail.com" target=3D"_b=
lank">sagarmehta@gmail.com</a>&gt; wrote:</div>

<br><blockquote type=3D"cite"><div dir=3D"ltr">Hi Guys,<div><br></div><div>=
We have fairly decent sized Hadoop cluster of about 200 nodes and was wonde=
ring what is the state of art if I want to aggregate and visualize Hadoop e=
cosystem logs, particularly</div>


<div><ol><li>Tasktracker logs<br></li><li>Datanode logs<br></li><li>Hbase R=
egionServer logs<br></li></ol><div>One way is to use something like a Flume=
 on each node to aggregate the logs and then use something like Kibana -=A0=
<a href=3D"http://www.elasticsearch.org/overview/kibana/" target=3D"_blank"=
>http://www.elasticsearch.org/overview/kibana/</a> to visualize the logs an=
d make them searchable.</div>


</div><div><br></div><div>However I don&#39;t want to write another ETL for=
 the hadoop/hbase logs =A0themselves. We currently log in to each machine i=
ndividually to &#39;tail -F logs&#39; when there is an hadoop problem on a =
particular node.</div>


<div><br></div><div>We want a better way to look at the hadoop logs themsel=
ves in a centralized way when there is an issue without having to login to =
100 different machines and was wondering what is the state of are in this r=
egard.</div>


<div><br></div><div>Suggestions/Pointers are very welcome!!</div><div><br><=
/div><div>Sagar</div></div>
</blockquote></div><br><div>
--<br>Alexander Alten-Lorenz<br><a href=3D"http://mapredit.blogspot.com" ta=
rget=3D"_blank">http://mapredit.blogspot.com</a><br>German Hadoop LinkedIn =
Group:=A0<a href=3D"http://goo.gl/N8pCF" target=3D"_blank">http://goo.gl/N8=
pCF</a>

</div>
<br></div></blockquote></div><br></div></div></div></div>
</blockquote></div><br></div>

--001a11332d981803d504e87a0fd6--