Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 20874 invoked from network); 29 Jul 2010 07:58:41 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 29 Jul 2010 07:58:41 -0000 Received: (qmail 66099 invoked by uid 500); 29 Jul 2010 07:58:39 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 65998 invoked by uid 500); 29 Jul 2010 07:58:36 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 65990 invoked by uid 99); 29 Jul 2010 07:58:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Jul 2010 07:58:35 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.216.172] (HELO mail-qy0-f172.google.com) (209.85.216.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Jul 2010 07:58:29 +0000 Received: by qyk1 with SMTP id 1so3808122qyk.10 for ; Thu, 29 Jul 2010 00:58:07 -0700 (PDT) MIME-Version: 1.0 Received: by 10.224.86.222 with SMTP id t30mr8807048qal.79.1280390287712; Thu, 29 Jul 2010 00:58:07 -0700 (PDT) Received: by 10.229.74.212 with HTTP; Thu, 29 Jul 2010 00:58:07 -0700 (PDT) In-Reply-To: References: <3b4537b4-07bc-e29a-b5c2-a4059067d691@me.com> Date: Thu, 29 Jul 2010 00:58:07 -0700 Message-ID: Subject: Re: Cassandra vs MongoDB From: Jeff Hammerbacher To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=00c09f88d29945ed73048c82187d X-Virus-Checked: Checked by ClamAV on apache.org --00c09f88d29945ed73048c82187d Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Having participated in the design of a few of these systems being mentioned= , I'll chime in here and point out that the combination of Flume and Hive makes CDH3 very useful for log processing and that use case is directly in the wheelhouse of the system, especially for large collections of log files (as search logs tend to be). On Wed, Jul 28, 2010 at 2:59 PM, Jeremy Hanna w= rote: > > "As a result, we designed and built Flume... > > (I wonder if this could deliver into Cassanda :) ) > > > Yes - apparently it's pretty easy to do - I was thinking of doing it but > haven't found the time yet. > > https://issues.cloudera.org//browse/FLUME-20 > > On Jul 28, 2010, at 4:30 PM, Aaron Morton wrote: > > > > >> If you are looking to store web logs and then do ad hoc queries you > might/should be using Hadoop (depending on how big your logs are) > > > > I agree, take a look at the Cloudera Hadopp 3 CDH3, they include an app > called Flume for moving data... > > > > "As a result, we designed and built Flume. Flume is a distributed servi= ce > that makes it very easy to collect and aggregate your data into a persist= ent > store such as HDFS. Flume can read data from almost any source =96 log fi= les, > Syslog packets, the standard output of any Unix process =96 and can deliv= er it > to a batch processing system like Hadoop or a real-time data store like > HBase. All this can be configured dynamically from a single, central > location =96 no more tedious configuration file editing and process > restarting. Flume will collect the data from wherever existing applicatio= ns > are storing it, and whisk it away for further analysis and processing." > > > > (I wonder if this could deliver into Cassanda :) ) > > > > If it's straight log file processing Hadoop may be a better fit. > > > > Aaron > > --00c09f88d29945ed73048c82187d Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Having participated in the design of a few of these systems being mentioned= , I'll chime in here and point out that the combination of Flume and Hi= ve makes CDH3 very useful for log processing and that use case is directly = in the wheelhouse of the system, especially for large collections of log fi= les (as search logs tend to be).

On Wed, Jul 28, 2010 at 2:59 PM, Jeremy Hann= a <jerem= y.hanna1234@gmail.com> wrote:
> "As a result, we designed and built Flume...
> (I wonder if this could deliver into Cassanda :) )

Yes - apparently it's pretty easy to do - I was thinking of doing= it but haven't found the time yet.

= https://issues.cloudera.org//browse/FLUME-20

On Jul 28, 2010, at 4:30 PM, Aaron Morton wrote:

>
>> If you are looking to store web logs and then do ad hoc queries yo= u might/should be using Hadoop (depending on how big your logs are)
>
> I agree, take a look at the Cloudera Hadopp 3 CDH3, they include an ap= p called Flume for moving data...
>
> "As a result, we designed and built Flume. Flume is a distributed= service that makes it very easy to collect and aggregate your data into a = persistent store such as HDFS. Flume can read data from almost any source = =96 log files, Syslog packets, the standard output of any Unix process =96 = and can deliver it to a batch processing system like Hadoop or a real-time = data store like HBase. All this can be configured dynamically from a single= , central location =96 no more tedious configuration file editing and proce= ss restarting. Flume will collect the data from wherever existing applicati= ons are storing it, and whisk it away for further analysis and processing.&= quot;
>
> (I wonder if this could deliver into Cassanda :) )
>
> If it's straight log file processing Hadoop may be a better fit. >
> Aaron


--00c09f88d29945ed73048c82187d--