Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from
	:mime-version:content-type:subject:date:in-reply-to:to
	:references:message-id; q=dns; s=thelastpickle.com; b=qqG4fZh7hY
	J5Gb6SxZCBtA1V6VXDqCv0RStFZCqS52YQ5QVh2bSJUSJ2iwawPGMGKktF4u7SoB
	iN+wwjGTk1hDpRfPkq6J5y5ndTWo3QV9nZDAGq1yyLzJ5IzL+2vGgNWooH1SjpqL
	nkHnzBK4hF+KkPnNLMCMcYTWMfRvjXE4c=
From: aaron morton <aaron@thelastpickle.com>
Mime-Version: 1.0 (Apple Message framework v1251.1)
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_E3867D47-2ADC-4D43-9529-17115C014422"
Subject: Re: Flume and Cassandra
Date: Fri, 10 Feb 2012 22:35:03 +1300
In-Reply-To: 
 <CA+VSrLpgJkhCWJkAuPQxKVdHZLcjMKFfeLca2uN0=XzP5KCYkQ@mail.gmail.com>
To: user@cassandra.apache.org
References: 
 <CA+VSrLpgJkhCWJkAuPQxKVdHZLcjMKFfeLca2uN0=XzP5KCYkQ@mail.gmail.com>
Message-Id: <9A11F5E8-6E83-427B-AEAD-6E66ED0E779D@thelastpickle.com>


--Apple-Mail=_E3867D47-2ADC-4D43-9529-17115C014422
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=iso-8859-1

> How to do it ? Do I need to build a custom plugin/sink or can I =
configure an existing sink to write data in a custom way ?
This is a good starting point =
https://github.com/thobbs/flume-cassandra-plugin

> 2 - My business process also use my Cassandra DB (without flume, =
directly via thrift), how to ensure that log writing won't overload my =
database and introduce latency in my business process ?
Anytime you have a data stream you don't control it's a good idea to put =
some sort of buffer in there between the outside world and the database. =
Flume has a buffered sync, I think your can subclass it and aggregate =
the counters for a minute or two =
http://archive.cloudera.com/cdh/3/flume/UserGuide/#_buffered_sink_and_deco=
rator_semantics

Hope that helps.=20
A
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 10/02/2012, at 4:27 AM, Alain RODRIGUEZ wrote:

> Hi,
>=20
> 1 - I would like to generate some statistics and store some raw events =
from log files tailed with flume. I saw some plugins giving Cassandra =
sinks but I would like to store data in a custom way, storing raw data =
but also incrementing counters to get near real-time statistcis. How to =
do it ? Do I need to build a custom plugin/sink or can I configure an =
existing sink to write data in a custom way ?
>=20
> 2 - My business process also use my Cassandra DB (without flume, =
directly via thrift), how to ensure that log writing won't overload my =
database and introduce latency in my business process ? I mean, is there =
a way to to manage the throughput sent by the flume's tails and slow =
them when my Cassandra cluster is overloaded ? I would like to avoid =
building 2 separated clusters.
>=20
> Thank you,
>=20
> Alain
>=20


--Apple-Mail=_E3867D47-2ADC-4D43-9529-17115C014422
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=iso-8859-1

<html><head></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; =
"><div><blockquote type=3D"cite"><div>How to do it ? Do I need to build =
a custom plugin/sink or can I configure an existing sink to write data =
in a custom way ?</div></blockquote></div>This is a good starting =
point&nbsp;<a =
href=3D"https://github.com/thobbs/flume-cassandra-plugin">https://github.c=
om/thobbs/flume-cassandra-plugin</a><div><br></div><div><blockquote =
type=3D"cite"><div><font color=3D"#222222" face=3D"arial, sans-serif">2 =
- My business process also use my Cassandra DB (without flume, directly =
via thrift), how to ensure that log writing won't overload my database =
and introduce latency in my business process =
?</font></div></blockquote>Anytime you have a data stream you don't =
control it's a good idea to put some sort of buffer in there between the =
outside world and the database. Flume has a buffered sync, I think your =
can subclass it and aggregate the counters for a minute or two&nbsp;<a =
href=3D"http://archive.cloudera.com/cdh/3/flume/UserGuide/#_buffered_sink_=
and_decorator_semantics">http://archive.cloudera.com/cdh/3/flume/UserGuide=
/#_buffered_sink_and_decorator_semantics</a></div><div><br></div><div>Hope=
 that helps.&nbsp;</div><div>A</div><div><div =
apple-content-edited=3D"true">
<span class=3D"Apple-style-span" style=3D"border-collapse: separate; =
color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; =
font-variant: normal; font-weight: normal; letter-spacing: normal; =
line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: =
normal; font-weight: normal; letter-spacing: normal; line-height: =
normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: =
0px; -webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; =
"><div><div>-----------------</div><div>Aaron Morton</div><div>Freelance =
Developer</div><div>@aaronmorton</div><div><a =
href=3D"http://www.thelastpickle.com">http://www.thelastpickle.com</a></di=
v></div></div></span></div></span></div></span></span>
</div>
<br><div><div>On 10/02/2012, at 4:27 AM, Alain RODRIGUEZ wrote:</div><br =
class=3D"Apple-interchange-newline"><blockquote type=3D"cite"><div =
style=3D"">Hi,</div><div style=3D""><br></div><div style=3D"">1 - I =
would like to&nbsp;generate some statistics&nbsp;and store some raw =
events from log files tailed with flume. I saw some plugins giving =
Cassandra sinks but I would like to store data in a custom way, storing =
raw data but also incrementing counters to get near real-time =
statistcis. How to do it ? Do I need to build a custom plugin/sink or =
can I configure an existing sink to write data in a custom way ?</div>

<div style=3D""><br></div><div style=3D""><font color=3D"#222222" =
face=3D"arial, sans-serif">2 - My business process also use my Cassandra =
DB (without flume, directly via thrift), how to ensure that log writing =
won't overload my database and introduce latency in my business process =
? I mean, is there a way to to manage the throughput sent by the flume's =
tails and slow them when my Cassandra cluster is overloaded ? I would =
like to avoid building 2&nbsp;separated&nbsp;clusters.</font></div>

<div style=3D""><font color=3D"#222222" face=3D"arial, =
sans-serif"><br></font></div><div style=3D""><font color=3D"#222222" =
face=3D"arial, sans-serif">Thank you,</font></div><div style=3D""><font =
color=3D"#222222" face=3D"arial, sans-serif"><br></font></div>

<div style=3D""><font color=3D"#222222" face=3D"arial, =
sans-serif">Alain</font></div><div style=3D""><br></div>
</blockquote></div><br></div></body></html>=

--Apple-Mail=_E3867D47-2ADC-4D43-9529-17115C014422--