Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0B2919191 for ; Fri, 10 Feb 2012 09:36:07 +0000 (UTC) Received: (qmail 15260 invoked by uid 500); 10 Feb 2012 09:36:03 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 14210 invoked by uid 500); 10 Feb 2012 09:35:47 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 14197 invoked by uid 99); 10 Feb 2012 09:35:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Feb 2012 09:35:41 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a80.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Feb 2012 09:35:33 +0000 Received: from homiemail-a80.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a80.g.dreamhost.com (Postfix) with ESMTP id 6D7F237A078 for ; Fri, 10 Feb 2012 01:35:07 -0800 (PST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=qqG4fZh7hY J5Gb6SxZCBtA1V6VXDqCv0RStFZCqS52YQ5QVh2bSJUSJ2iwawPGMGKktF4u7SoB iN+wwjGTk1hDpRfPkq6J5y5ndTWo3QV9nZDAGq1yyLzJ5IzL+2vGgNWooH1SjpqL nkHnzBK4hF+KkPnNLMCMcYTWMfRvjXE4c= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=a/I6VlwwLrRt3mfX D+GgWd3raLc=; b=OS9bTx7WF1fc3E0weEMKtLoG5GQSgZB2nrILCTvvHDVPrNEJ DJoj/9eqzlWTgFp5S1TW+zffzjpL9oFDrpfBEBidDNEds/nfhygyiJ1DfRBTKUVY NEZTiCUqri531JNudzbU1EcrmhaxsEwRaTfkP6jlFs2gfbiXk0vSjUqJIDk= Received: from [172.16.1.3] (125-236-193-159.adsl.xtra.co.nz [125.236.193.159]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a80.g.dreamhost.com (Postfix) with ESMTPSA id C50B737A065 for ; Fri, 10 Feb 2012 01:35:06 -0800 (PST) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1251.1) Content-Type: multipart/alternative; boundary="Apple-Mail=_E3867D47-2ADC-4D43-9529-17115C014422" Subject: Re: Flume and Cassandra Date: Fri, 10 Feb 2012 22:35:03 +1300 In-Reply-To: To: user@cassandra.apache.org References: Message-Id: <9A11F5E8-6E83-427B-AEAD-6E66ED0E779D@thelastpickle.com> X-Mailer: Apple Mail (2.1251.1) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_E3867D47-2ADC-4D43-9529-17115C014422 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 > How to do it ? Do I need to build a custom plugin/sink or can I = configure an existing sink to write data in a custom way ? This is a good starting point = https://github.com/thobbs/flume-cassandra-plugin > 2 - My business process also use my Cassandra DB (without flume, = directly via thrift), how to ensure that log writing won't overload my = database and introduce latency in my business process ? Anytime you have a data stream you don't control it's a good idea to put = some sort of buffer in there between the outside world and the database. = Flume has a buffered sync, I think your can subclass it and aggregate = the counters for a minute or two = http://archive.cloudera.com/cdh/3/flume/UserGuide/#_buffered_sink_and_deco= rator_semantics Hope that helps.=20 A ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 10/02/2012, at 4:27 AM, Alain RODRIGUEZ wrote: > Hi, >=20 > 1 - I would like to generate some statistics and store some raw events = from log files tailed with flume. I saw some plugins giving Cassandra = sinks but I would like to store data in a custom way, storing raw data = but also incrementing counters to get near real-time statistcis. How to = do it ? Do I need to build a custom plugin/sink or can I configure an = existing sink to write data in a custom way ? >=20 > 2 - My business process also use my Cassandra DB (without flume, = directly via thrift), how to ensure that log writing won't overload my = database and introduce latency in my business process ? I mean, is there = a way to to manage the throughput sent by the flume's tails and slow = them when my Cassandra cluster is overloaded ? I would like to avoid = building 2 separated clusters. >=20 > Thank you, >=20 > Alain >=20 --Apple-Mail=_E3867D47-2ADC-4D43-9529-17115C014422 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1
How to do it ? Do I need to build = a custom plugin/sink or can I configure an existing sink to write data = in a custom way ?
This is a good starting = point https://github.c= om/thobbs/flume-cassandra-plugin

2 = - My business process also use my Cassandra DB (without flume, directly = via thrift), how to ensure that log writing won't overload my database = and introduce latency in my business process = ?
Anytime you have a data stream you don't = control it's a good idea to put some sort of buffer in there between the = outside world and the database. Flume has a buffered sync, I think your = can subclass it and aggregate the counters for a minute or two http://archive.cloudera.com/cdh/3/flume/UserGuide= /#_buffered_sink_and_decorator_semantics

Hope= that helps. 
A
http://www.thelastpickle.com

On 10/02/2012, at 4:27 AM, Alain RODRIGUEZ wrote:

Hi,

1 - I = would like to generate some statistics and store some raw = events from log files tailed with flume. I saw some plugins giving = Cassandra sinks but I would like to store data in a custom way, storing = raw data but also incrementing counters to get near real-time = statistcis. How to do it ? Do I need to build a custom plugin/sink or = can I configure an existing sink to write data in a custom way ?

2 - My business process also use my Cassandra = DB (without flume, directly via thrift), how to ensure that log writing = won't overload my database and introduce latency in my business process = ? I mean, is there a way to to manage the throughput sent by the flume's = tails and slow them when my Cassandra cluster is overloaded ? I would = like to avoid building 2 separated clusters.

Thank you,

Alain


= --Apple-Mail=_E3867D47-2ADC-4D43-9529-17115C014422--