Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: error (athena.apache.org: local policy)
Date: Tue, 17 Mar 2015 05:36:06 +0000 (UTC)
From: Anuj Wadehra <anujw_2003@yahoo.co.in>
Reply-To: Anuj Wadehra <anujw_2003@yahoo.co.in>
To: Ali Akhtar <ali.rac200@gmail.com>,
	"user@cassandra.apache.org" <user@cassandra.apache.org>
Message-ID: <755669560.659173.1426570566812.JavaMail.yahoo@mail.yahoo.com>
In-Reply-To: 
 <CAKiMtbeqP54LpqyQ2h_6-GBPeE7=5AwoEDs3uJs+Twnko4Rrrw@mail.gmail.com>
References: 
 <CAKiMtbeqP54LpqyQ2h_6-GBPeE7=5AwoEDs3uJs+Twnko4Rrrw@mail.gmail.com>
Subject: Re: Run Mixed Workload using two instances on one node
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_Part_659172_911706740.1426570566805"

------=_Part_659172_911706740.1426570566805
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

I understand that 2 instances on one node looks a weird solution. But can h=
ave dedicated reporting nodes for big customers but not for small customers=
.=20

My questions would be:1. What is the technical reasoning? What problems you=
 foresee=C2=A0 if we use 2 C* instances on one node in production? We have =
ample HW on each server and mostly it's under-utilized. We just want that h=
eavy reporting must not impact OLTP and both OLTP and reporting should be i=
ndividually scalable.

2. I think we dont need Elastic Search. We just need a plain Reporting DB w=
hich can reply to reporting queries.We can create our own CF as indexes. We=
 dont need overhead of another 3PP for our current reporting needs.
ThanksAnuj


=20


     On Tuesday, 17 March 2015 9:59 AM, Ali Akhtar <ali.rac200@gmail.com> w=
rote:
  =20

 I don't think its recommended to have two instances on the same node.Have =
you considered using something like elasticsearch for the reports? Its desi=
gned for that sort of thing.On Mar 17, 2015 8:07 AM, "Anuj Wadehra" <anujw_=
2003@yahoo.co.in> wrote:


 Hi,

We are trying to Decouple our Reporting DB from OLTP. Need urgent help on t=
he feasibility of proposed solution for PRODUCTION.

Use Case: Currently, our OLTP and Reporting application and DB are same. So=
me CF are used for both OLTP and Reporting while others are solely used for=
 Reporting.Every business transaction synchronously updates the main OLTP C=
F and asynchronously updates other Reporting CFs.

Problem Statement:
1. Decouple Reporting and OLTP such that Reporting load can't impact=C2=A0 =
OLTP performance.
2. Scaling of Reporting=C2=A0 and OLTP modules must be independent
3. OLTP client should not update all Reporting CFs. We generate Data Record=
s on File sytem/shared disk.Reporting should use these Records to create Re=
porting DB.
4. Small customers may do OLTP and Reporting on same 3-node cluster. Bigger=
 customers can be given an option to have dedicated OLTP and Reporting node=
s. So, standard Hardware box should be usable for 3 deployments (OLTP,Repor=
ting or OLTP+Reporting)

Note: Reporting is ad-hoc, may involve full table scans and does not involv=
e Analytics. Data size is huge 2TB (OLTP+Reporting) per node.

Hardware : Standard deployment -3 node cluster with each node having 24 cor=
es, 64GB RAM, 400GB * 6 SSDs in RAID5

Proposed Solution:
1. Split OLTP and Reporting clients into two application components.
2. For small deployments where more than 3 nodes are not required:
=C2=A0 =C2=A0 A. Install 2 Cassandra instances on each node one for OLTP an=
d other for Reporting
=C2=A0 =C2=A0 B. To distribute I/O load in 2:1 --Remove RAID5 (as Cassandra=
 offers replication) and assign 4 disks as JBod for OLTP and 2 disks for Re=
porting
=C2=A0 =C2=A0 C. RAM is abundant and often under-utilized , so assign 8GB e=
ach for 2 Cassandra instance
=C2=A0 =C2=A0 D. To make sure that Reporting is not able to overload CPU, t=
une concurrent_reads,concurrent_writes=20
 OLTP client will only write to OLTP DB and generate DB record. Reporting c=
lient will poll FS and populate Reporting DB in required format.
3. Larger customers can have Reporting clients and DB on dedicated physical=
 nodes with all resources.

Key Questions:
Is it ok to run 2 Cassandra instances on one node in Production system and =
limit CPU Usage,Disk I/O and RAM as suggested above?
Any other solution for above mentioned problem statement?


Thanks
Anuj


   =20


------=_Part_659172_911706740.1426570566805
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<html><body><div style=3D"color:#000; background-color:#fff; font-family:ga=
ramond, new york, times, serif;font-size:12px"><div id=3D"yui_3_16_0_1_1426=
569438317_7919" dir=3D"ltr">I understand that 2 instances on one node looks=
 a weird solution. But can have dedicated reporting nodes for big customers=
 but not for small customers. <br></div><div id=3D"yui_3_16_0_1_14265694383=
17_10019" dir=3D"ltr"><br></div><div id=3D"yui_3_16_0_1_1426569438317_10020=
" dir=3D"ltr">My questions would be:</div><div id=3D"yui_3_16_0_1_142656943=
8317_10021" dir=3D"ltr">1. What is the technical reasoning? What problems y=
ou foresee&nbsp; if we use 2 C* instances on one node in production? We hav=
e ample HW on each server and mostly it's under-utilized. We just want that=
 heavy reporting must not impact OLTP and both OLTP and reporting should be=
 individually scalable.<br><br></div><div id=3D"yui_3_16_0_1_1426569438317_=
12068" dir=3D"ltr">2. I think we dont need Elastic Search. We just need a p=
lain Reporting DB which can reply to reporting queries.We can create our ow=
n CF as indexes. We dont need overhead of another 3PP for our current repor=
ting needs.</div><div id=3D"yui_3_16_0_1_1426569438317_12967" dir=3D"ltr"><=
br></div><div id=3D"yui_3_16_0_1_1426569438317_12968" dir=3D"ltr">Thanks</d=
iv><div id=3D"yui_3_16_0_1_1426569438317_12969" dir=3D"ltr">Anuj<br></div><=
div id=3D"yui_3_16_0_1_1426569438317_10594" dir=3D"ltr"><br></div><div id=
=3D"yui_3_16_0_1_1426569438317_10883" dir=3D"ltr"><br></div><div id=3D"yui_=
3_16_0_1_1426569438317_10022" dir=3D"ltr"><br></div><div id=3D"yui_3_16_0_1=
_1426569438317_11172" dir=3D"ltr"><br></div><div id=3D"yui_3_16_0_1_1426569=
438317_11173" dir=3D"ltr"><br></div><div id=3D"yui_3_16_0_1_1426569438317_7=
611"><span></span></div>  <br><div class=3D"qtdSeparateBR"><br><br></div><d=
iv style=3D"display: block;" class=3D"yahoo_quoted"> <div style=3D"font-fam=
ily: garamond, new york, times, serif; font-size: 12px;"> <div style=3D"fon=
t-family: HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, S=
ans-Serif; font-size: 16px;"> <div dir=3D"ltr"> <font size=3D"2" face=3D"Ar=
ial"> On Tuesday, 17 March 2015 9:59 AM, Ali Akhtar &lt;ali.rac200@gmail.co=
m&gt; wrote:<br> </font> </div>  <br><br> <div class=3D"y_msg_container"><d=
iv id=3D"yiv3179442555"><div><div dir=3D"ltr">I don't think its recommended=
 to have two instances on the same node.</div>
<div dir=3D"ltr">Have you considered using something like elasticsearch for=
 the reports? Its designed for that sort of thing.</div>
<div class=3D"yiv3179442555yqt9712989626" id=3D"yiv3179442555yqt16842"><div=
 class=3D"yiv3179442555gmail_quote">On Mar 17, 2015 8:07 AM, "Anuj Wadehra"=
 &lt;<a rel=3D"nofollow" shape=3D"rect" ymailto=3D"mailto:anujw_2003@yahoo.=
co.in" target=3D"_blank" href=3D"mailto:anujw_2003@yahoo.co.in">anujw_2003@=
yahoo.co.in</a>&gt; wrote:<br clear=3D"none"><blockquote class=3D"yiv317944=
2555gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex;"><div><div style=3D"color:#000;background-color:#fff;font-fam=
ily:garamond, new york, times, serif;font-size:12px;"><div><span></span></d=
iv><div><br clear=3D"none"><br clear=3D"none"></div>  <div style=3D"display=
:block;"><div style=3D"font-family:garamond, new york, times, serif;font-si=
ze:12px;"><div style=3D"font-family:HelveticaNeue, Helvetica Neue, Helvetic=
a, Arial, Lucida Grande, Sans-Serif;font-size:16px;"><div>Hi,<br clear=3D"n=
one"><br clear=3D"none">We are trying to Decouple our Reporting DB from OLT=
P. Need urgent help on the feasibility of proposed solution for PRODUCTION.=
<br clear=3D"none"><br clear=3D"none">Use Case: Currently, our OLTP and Rep=
orting application and DB are same. Some CF are used for both OLTP and Repo=
rting while others are solely used for Reporting.Every business transaction=
 synchronously updates the main OLTP CF and asynchronously updates other Re=
porting CFs.<br clear=3D"none"><br clear=3D"none">Problem Statement:<br cle=
ar=3D"none">1. Decouple Reporting and OLTP such that Reporting load can't i=
mpact&nbsp; OLTP performance.<br clear=3D"none">2. Scaling of Reporting&nbs=
p; and OLTP modules must be independent<br clear=3D"none">3. OLTP client sh=
ould not update all Reporting CFs. We generate Data Records on File sytem/s=
hared disk.Reporting should use these Records to create Reporting DB.<br cl=
ear=3D"none">4. Small customers may do OLTP and Reporting on same 3-node cl=
uster. Bigger customers can be given an option to have dedicated OLTP and R=
eporting nodes. So, standard Hardware box should be usable for 3 deployment=
s (OLTP,Reporting or OLTP+Reporting)<br clear=3D"none"><br clear=3D"none">N=
ote: Reporting is ad-hoc, may involve full table scans and does not involve=
 Analytics. Data size is huge 2TB (OLTP+Reporting) per node.<br clear=3D"no=
ne"><br clear=3D"none">Hardware : Standard deployment -3 node cluster with =
each node having 24 cores, 64GB RAM, 400GB * 6 SSDs in RAID5<br clear=3D"no=
ne"><br clear=3D"none">Proposed Solution:<br clear=3D"none">1. Split OLTP a=
nd Reporting clients into two application components.<br clear=3D"none">2. =
For small deployments where more than 3 nodes are not required:<br clear=3D=
"none">&nbsp; &nbsp; A. Install 2 Cassandra instances on each node one for =
OLTP and other for Reporting<br clear=3D"none">&nbsp; &nbsp; B. To distribu=
te I/O load in 2:1 --Remove RAID5 (as Cassandra offers replication) and ass=
ign 4 disks as JBod for OLTP and 2 disks for Reporting<br clear=3D"none">&n=
bsp; &nbsp; C. RAM is abundant and often under-utilized , so assign 8GB eac=
h for 2 Cassandra instance<br clear=3D"none">&nbsp; &nbsp; D. To make sure =
that Reporting is not able to overload CPU, tune concurrent_reads,concurren=
t_writes <br clear=3D"none"> OLTP client will only write to OLTP DB and gen=
erate DB record. Reporting client will poll FS and populate Reporting DB in=
 required format.<br clear=3D"none">3. Larger customers can have Reporting =
clients and DB on dedicated physical nodes with all resources.<br clear=3D"=
none"><br clear=3D"none">Key Questions:<br clear=3D"none">Is it ok to run 2=
 Cassandra instances on one node in Production system and limit CPU Usage,D=
isk I/O and RAM as suggested above?<br clear=3D"none">Any other solution fo=
r above mentioned problem statement?<br clear=3D"none"><br clear=3D"none"><=
br clear=3D"none"><br clear=3D"none">Thanks<br clear=3D"none">Anuj<br clear=
=3D"none"><br clear=3D"none"><br clear=3D"none"></div>  </div> </div>  </di=
v> </div></div></blockquote></div></div></div></div><br><br></div>  </div> =
</div>  </div></div></body></html>
------=_Part_659172_911706740.1426570566805--