Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7C21F17C3A for ; Tue, 17 Mar 2015 05:39:08 +0000 (UTC) Received: (qmail 20100 invoked by uid 500); 17 Mar 2015 05:39:01 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 19926 invoked by uid 500); 17 Mar 2015 05:39:01 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 19742 invoked by uid 99); 17 Mar 2015 05:39:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Mar 2015 05:39:01 +0000 X-ASF-Spam-Status: No, hits=2.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_REPLYTO_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_NONE X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [106.10.148.165] (HELO nm6-vm6.bullet.mail.sg3.yahoo.com) (106.10.148.165) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Mar 2015 05:38:55 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.co.in; s=s2048; t=1426570567; bh=BA56Zfw3LmqZm2nsOOAJahZ7OpLaLJu5jBVOjrEF+UM=; h=Date:From:Reply-To:To:In-Reply-To:References:Subject:From:Subject; b=WpcmghKCUXx7QILt+XzRHnTs9S3p4NHWLZpTEsn3MMaS3qrRoEPEUv/k1l/NSPQiNLz2uiR4GRcJfLBGh0BoaSHEskwZrJQNQlUtK+KLKrkzvT+KRsGPH0gLCfmLlk5JD5LpBPhue3YuAgYNY9uLlOWxFjdy9en6NPqvRVBTuYZARVfiKMCOuoHKarX4YfihX8572Yr9kADlOcLy9mDFiRSs3YKNwLu6ltFpAqAxXMVGShtiO59gwB7eiRKAZ2bRDtPqo7O68II2O2qEteokkCVaAuqSN6Un+QR1ZoY9ZPE7d1y3uKkLrmHa6YDbdtTFSukC5UC3PntmooXLsExt+Q== Received: from [106.10.166.126] by nm6.bullet.mail.sg3.yahoo.com with NNFMP; 17 Mar 2015 05:36:07 -0000 Received: from [106.10.151.219] by tm15.bullet.mail.sg3.yahoo.com with NNFMP; 17 Mar 2015 05:36:07 -0000 Received: from [127.0.0.1] by omp1017.mail.sg3.yahoo.com with NNFMP; 17 Mar 2015 05:36:07 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 703095.25413.bm@omp1017.mail.sg3.yahoo.com X-YMail-OSG: 06912p0VM1njom8vlulNybMrv3x9DuS_2vBl3V79i6ri5.c4znuQoYWg0tDUU67 BMcQwPFPqOeKjxsmrvxDHfRpmzcWWH6fTDdoSBpAdoZhbof_pIxXJ_IYWAg9AEhckxsY0kwCBSsr vbAsOBfLfOe.ahZJsforQtr7OViiXVIQbw9qw3pbtNxfrPKw3YlfzvUN2steywwMvrsHYKnOeRb. D8L8yAcqqPxW6yiFq6eQTPTlSoeJoPdyKcsJ4JhpLMLtlLgrlCAM1KoWWeabplzr1D7IzzO4XXmy 0cpJd7hxpeLL8v1C7LrEkF_dSaBBkpfcV0sG8ee3ec.clF0gxmtU3PT1WBVpOgON6lDBGB_2w.Zd dFpOU624DhF59gLRYzCobUrfvtaIFiSgc2QWk.qnAwINa2tiqZaJgwiFyzK2ryx3U3rQz3UMc90F dH0UyqfI_UGtdGAFOMB_A5tAKuHd3yxYT576f7qeNfiVbGIugGjkWjI89sjh8hr782eiHjOA1nhg LoJ.r8jdbOw-- Received: by 106.10.196.90; Tue, 17 Mar 2015 05:36:07 +0000 Date: Tue, 17 Mar 2015 05:36:06 +0000 (UTC) From: Anuj Wadehra Reply-To: Anuj Wadehra To: Ali Akhtar , "user@cassandra.apache.org" Message-ID: <755669560.659173.1426570566812.JavaMail.yahoo@mail.yahoo.com> In-Reply-To: References: Subject: Re: Run Mixed Workload using two instances on one node MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_659172_911706740.1426570566805" X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_659172_911706740.1426570566805 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I understand that 2 instances on one node looks a weird solution. But can h= ave dedicated reporting nodes for big customers but not for small customers= .=20 My questions would be:1. What is the technical reasoning? What problems you= foresee=C2=A0 if we use 2 C* instances on one node in production? We have = ample HW on each server and mostly it's under-utilized. We just want that h= eavy reporting must not impact OLTP and both OLTP and reporting should be i= ndividually scalable. 2. I think we dont need Elastic Search. We just need a plain Reporting DB w= hich can reply to reporting queries.We can create our own CF as indexes. We= dont need overhead of another 3PP for our current reporting needs. ThanksAnuj =20 On Tuesday, 17 March 2015 9:59 AM, Ali Akhtar w= rote: =20 I don't think its recommended to have two instances on the same node.Have = you considered using something like elasticsearch for the reports? Its desi= gned for that sort of thing.On Mar 17, 2015 8:07 AM, "Anuj Wadehra" wrote: Hi, We are trying to Decouple our Reporting DB from OLTP. Need urgent help on t= he feasibility of proposed solution for PRODUCTION. Use Case: Currently, our OLTP and Reporting application and DB are same. So= me CF are used for both OLTP and Reporting while others are solely used for= Reporting.Every business transaction synchronously updates the main OLTP C= F and asynchronously updates other Reporting CFs. Problem Statement: 1. Decouple Reporting and OLTP such that Reporting load can't impact=C2=A0 = OLTP performance. 2. Scaling of Reporting=C2=A0 and OLTP modules must be independent 3. OLTP client should not update all Reporting CFs. We generate Data Record= s on File sytem/shared disk.Reporting should use these Records to create Re= porting DB. 4. Small customers may do OLTP and Reporting on same 3-node cluster. Bigger= customers can be given an option to have dedicated OLTP and Reporting node= s. So, standard Hardware box should be usable for 3 deployments (OLTP,Repor= ting or OLTP+Reporting) Note: Reporting is ad-hoc, may involve full table scans and does not involv= e Analytics. Data size is huge 2TB (OLTP+Reporting) per node. Hardware : Standard deployment -3 node cluster with each node having 24 cor= es, 64GB RAM, 400GB * 6 SSDs in RAID5 Proposed Solution: 1. Split OLTP and Reporting clients into two application components. 2. For small deployments where more than 3 nodes are not required: =C2=A0 =C2=A0 A. Install 2 Cassandra instances on each node one for OLTP an= d other for Reporting =C2=A0 =C2=A0 B. To distribute I/O load in 2:1 --Remove RAID5 (as Cassandra= offers replication) and assign 4 disks as JBod for OLTP and 2 disks for Re= porting =C2=A0 =C2=A0 C. RAM is abundant and often under-utilized , so assign 8GB e= ach for 2 Cassandra instance =C2=A0 =C2=A0 D. To make sure that Reporting is not able to overload CPU, t= une concurrent_reads,concurrent_writes=20 OLTP client will only write to OLTP DB and generate DB record. Reporting c= lient will poll FS and populate Reporting DB in required format. 3. Larger customers can have Reporting clients and DB on dedicated physical= nodes with all resources. Key Questions: Is it ok to run 2 Cassandra instances on one node in Production system and = limit CPU Usage,Disk I/O and RAM as suggested above? Any other solution for above mentioned problem statement? Thanks Anuj =20 ------=_Part_659172_911706740.1426570566805 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I understand that 2 instances on one node looks= a weird solution. But can have dedicated reporting nodes for big customers= but not for small customers.

My questions would be:
1. What is the technical reasoning? What problems y= ou foresee  if we use 2 C* instances on one node in production? We hav= e ample HW on each server and mostly it's under-utilized. We just want that= heavy reporting must not impact OLTP and both OLTP and reporting should be= individually scalable.

2. I think we dont need Elastic Search. We just need a p= lain Reporting DB which can reply to reporting queries.We can create our ow= n CF as indexes. We dont need overhead of another 3PP for our current repor= ting needs.
<= br>
Thanks
Anuj
<= div id=3D"yui_3_16_0_1_1426569438317_10594" dir=3D"ltr">







On Tuesday, 17 March 2015 9:59 AM, Ali Akhtar <ali.rac200@gmail.co= m> wrote:


I don't think its recommended= to have two instances on the same node.
Have you considered using something like elasticsearch for= the reports? Its designed for that sort of thing.
On Mar 17, 2015 8:07 AM, "Anuj Wadehra"= <anujw_2003@= yahoo.co.in> wrote:


Hi,

We are trying to Decouple our Reporting DB from OLT= P. Need urgent help on the feasibility of proposed solution for PRODUCTION.=

Use Case: Currently, our OLTP and Rep= orting application and DB are same. Some CF are used for both OLTP and Repo= rting while others are solely used for Reporting.Every business transaction= synchronously updates the main OLTP CF and asynchronously updates other Re= porting CFs.

Problem Statement:
1. Decouple Reporting and OLTP such that Reporting load can't i= mpact  OLTP performance.
2. Scaling of Reporting&nbs= p; and OLTP modules must be independent
3. OLTP client sh= ould not update all Reporting CFs. We generate Data Records on File sytem/s= hared disk.Reporting should use these Records to create Reporting DB.
4. Small customers may do OLTP and Reporting on same 3-node cl= uster. Bigger customers can be given an option to have dedicated OLTP and R= eporting nodes. So, standard Hardware box should be usable for 3 deployment= s (OLTP,Reporting or OLTP+Reporting)

N= ote: Reporting is ad-hoc, may involve full table scans and does not involve= Analytics. Data size is huge 2TB (OLTP+Reporting) per node.

Hardware : Standard deployment -3 node cluster with = each node having 24 cores, 64GB RAM, 400GB * 6 SSDs in RAID5

Proposed Solution:
1. Split OLTP a= nd Reporting clients into two application components.
2. = For small deployments where more than 3 nodes are not required:
    A. Install 2 Cassandra instances on each node one for = OLTP and other for Reporting
    B. To distribu= te I/O load in 2:1 --Remove RAID5 (as Cassandra offers replication) and ass= ign 4 disks as JBod for OLTP and 2 disks for Reporting
&n= bsp;   C. RAM is abundant and often under-utilized , so assign 8GB eac= h for 2 Cassandra instance
    D. To make sure = that Reporting is not able to overload CPU, tune concurrent_reads,concurren= t_writes
OLTP client will only write to OLTP DB and gen= erate DB record. Reporting client will poll FS and populate Reporting DB in= required format.
3. Larger customers can have Reporting = clients and DB on dedicated physical nodes with all resources.

Key Questions:
Is it ok to run 2= Cassandra instances on one node in Production system and limit CPU Usage,D= isk I/O and RAM as suggested above?
Any other solution fo= r above mentioned problem statement?

<= br clear=3D"none">
Thanks
Anuj




= ------=_Part_659172_911706740.1426570566805--