Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <A2E2F9CF-31E1-4BF9-91F3-FA0EB0BDC91D@gmail.com>
References: <3b4537b4-07bc-e29a-b5c2-a4059067d691@me.com>
	<A2E2F9CF-31E1-4BF9-91F3-FA0EB0BDC91D@gmail.com>
Date: Thu, 29 Jul 2010 00:58:07 -0700
Message-ID: <AANLkTimNLtkw8ZdRCBy7VYm6Qa-MXKztGq0+t_NvfwTN@mail.gmail.com>
Subject: Re: Cassandra vs MongoDB
From: Jeff Hammerbacher <hammer@cloudera.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=00c09f88d29945ed73048c82187d

--00c09f88d29945ed73048c82187d
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Having participated in the design of a few of these systems being mentioned=
,
I'll chime in here and point out that the combination of Flume and Hive
makes CDH3 very useful for log processing and that use case is directly in
the wheelhouse of the system, especially for large collections of log files
(as search logs tend to be).

On Wed, Jul 28, 2010 at 2:59 PM, Jeremy Hanna <jeremy.hanna1234@gmail.com>w=
rote:

> > "As a result, we designed and built Flume...
> > (I wonder if this could deliver into Cassanda :) )
>
>
> Yes - apparently it's pretty easy to do - I was thinking of doing it but
> haven't found the time yet.
>
> https://issues.cloudera.org//browse/FLUME-20
>
> On Jul 28, 2010, at 4:30 PM, Aaron Morton wrote:
>
> >
> >> If you are looking to store web logs and then do ad hoc queries you
> might/should be using Hadoop (depending on how big your logs are)
> >
> > I agree, take a look at the Cloudera Hadopp 3 CDH3, they include an app
> called Flume for moving data...
> >
> > "As a result, we designed and built Flume. Flume is a distributed servi=
ce
> that makes it very easy to collect and aggregate your data into a persist=
ent
> store such as HDFS. Flume can read data from almost any source =96 log fi=
les,
> Syslog packets, the standard output of any Unix process =96 and can deliv=
er it
> to a batch processing system like Hadoop or a real-time data store like
> HBase. All this can be configured dynamically from a single, central
> location =96 no more tedious configuration file editing and process
> restarting. Flume will collect the data from wherever existing applicatio=
ns
> are storing it, and whisk it away for further analysis and processing."
> >
> > (I wonder if this could deliver into Cassanda :) )
> >
> > If it's straight log file processing Hadoop may be a better fit.
> >
> > Aaron
>
>

--00c09f88d29945ed73048c82187d
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Having participated in the design of a few of these systems being mentioned=
, I&#39;ll chime in here and point out that the combination of Flume and Hi=
ve makes CDH3 very useful for log processing and that use case is directly =
in the wheelhouse of the system, especially for large collections of log fi=
les (as search logs tend to be).<br>
<br><div class=3D"gmail_quote">On Wed, Jul 28, 2010 at 2:59 PM, Jeremy Hann=
a <span dir=3D"ltr">&lt;<a href=3D"mailto:jeremy.hanna1234@gmail.com">jerem=
y.hanna1234@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_q=
uote" style=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 2=
04, 204); padding-left: 1ex;">
&gt; &quot;As a result, we designed and built Flume...<br>
<div class=3D"im">&gt; (I wonder if this could deliver into Cassanda :) )<b=
r>
<br>
<br>
</div>Yes - apparently it&#39;s pretty easy to do - I was thinking of doing=
 it but haven&#39;t found the time yet.<br>
<br>
<a href=3D"https://issues.cloudera.org//browse/FLUME-20" target=3D"_blank">=
https://issues.cloudera.org//browse/FLUME-20</a><br>
<div><div></div><div class=3D"h5"><br>
On Jul 28, 2010, at 4:30 PM, Aaron Morton wrote:<br>
<br>
&gt;<br>
&gt;&gt; If you are looking to store web logs and then do ad hoc queries yo=
u might/should be using Hadoop (depending on how big your logs are)<br>
&gt;<br>
&gt; I agree, take a look at the Cloudera Hadopp 3 CDH3, they include an ap=
p called Flume for moving data...<br>
&gt;<br>
&gt; &quot;As a result, we designed and built Flume. Flume is a distributed=
 service that makes it very easy to collect and aggregate your data into a =
persistent store such as HDFS. Flume can read data from almost any source =
=96 log files, Syslog packets, the standard output of any Unix process =96 =
and can deliver it to a batch processing system like Hadoop or a real-time =
data store like HBase. All this can be configured dynamically from a single=
, central location =96 no more tedious configuration file editing and proce=
ss restarting. Flume will collect the data from wherever existing applicati=
ons are storing it, and whisk it away for further analysis and processing.&=
quot;<br>

&gt;<br>
&gt; (I wonder if this could deliver into Cassanda :) )<br>
&gt;<br>
&gt; If it&#39;s straight log file processing Hadoop may be a better fit.<b=
r>
&gt;<br>
&gt; Aaron<br>
<br>
</div></div></blockquote></div><br>

--00c09f88d29945ed73048c82187d--