From user-return-64526-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Tue Oct 1 15:31:11 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 41667180608 for ; Tue, 1 Oct 2019 17:31:11 +0200 (CEST) Received: (qmail 83207 invoked by uid 500); 1 Oct 2019 15:31:06 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 83197 invoked by uid 99); 1 Oct 2019 15:31:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Oct 2019 15:31:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 0E98A1A3215 for ; Tue, 1 Oct 2019 15:31:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.798 X-Spam-Level: * X-Spam-Status: No, score=1.798 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=legtux.org Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id CyIyLDyBSISk for ; Tue, 1 Oct 2019 15:31:01 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2001:41d0:2:12d1::; helo=mail.legtux.org; envelope-from=jagernicolas@legtux.org; receiver= Received: from mail.legtux.org (legtux.org [IPv6:2001:41d0:2:12d1::]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 6B3FA7DDE7 for ; Tue, 1 Oct 2019 15:31:01 +0000 (UTC) Received: from webmail.legtux.org (legtux [IPv6:2001:41d0:2:12d1::]) (Authenticated sender: jagernicolas@legtux.org) by mail.legtux.org (Postfix) with ESMTPSA id 8254012E7617 for ; Tue, 1 Oct 2019 17:30:55 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=legtux.org; s=dkim; t=1569943855; bh=OOCjOeE/a8D/MtoZHz6V58mTrP28xej2tqS+5JSrqSw=; h=Date:From:Subject:To:In-Reply-To:References:From; b=ooF+25eC2bKdJAQMCdV45VFboOrvsQpoCpj9Jn2x9/8j1ODnLXjiL8SBjNx7s2o1R 2pXH03hwHYCt2UUQZeN0JgBszDrFGrYIEVnTkLFC7R7qsRRWcGKGZrQ7nybjkxHIMd XxFGcNYrzYiLr1I56oni+a7qpTvAGB+0GYFqwMko= MIME-Version: 1.0 Date: Tue, 01 Oct 2019 15:30:55 +0000 Content-Type: multipart/alternative; boundary="--=_RainLoop_445_400940626.1569943855" X-Mailer: RainLoop/1.13.0 From: jagernicolas@legtux.org Message-ID: Subject: Re: Sizing a cluster To: user@cassandra.apache.org In-Reply-To: References: <1b3dd531170ddc46b8123e36be3e968a@legtux.org> ----=_RainLoop_445_400940626.1569943855 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi L=C3=A9o thax for the links,=0A=0A Is that the size of the uncompresse= d data or the data once it has been inserted and compressed by cassandra = ?The size of 0.5MB is the size of the data we sent, before cassandra do c= ompression if any.=0A Looking at the cassandra compression : http://cassa= ndra.apache.org/doc/latest/operating/compression.html (http://cassandra.a= pache.org/doc/latest/operating/compression.html) and testing different pa= rameters on a test cluster might be interesting before you do the sizing = of the final production cluster,We are in dev phase, we have two small cl= usters. I haven't yet take in account the compression. For the compaction= I roughly considered that we need 50% extra space per node (the extra sp= ace is not in the calculation I did in my last email).=0A1 octobre 2019 0= 8:58 "L=C3=A9o FERLIN SUTTON" )> a =C3=A9crit:=0A Hi !=0AI'm not an expert but don't = forget that cassandra needs space to do it's compactions. =0ATake a look = at the worst case scenarios from this datastax grid : https://docs.datast= ax.com/en/dse-planning/doc/planning/capacityPlanning.html#capacityPlannin= g__disk (https://docs.datastax.com/en/dse-planning/doc/planning/capacityP= lanning.html#capacityPlanning__disk) =0A> The size of a picture + data i= s about 0.5MB =0AIs that the size of the uncompressed data or the data o= nce it has been inserted and compressed by cassandra ? =0A Looking at the= cassandra compression : http://cassandra.apache.org/doc/latest/operating= /compression.html (http://cassandra.apache.org/doc/latest/operating/compr= ession.html) and testing different parameters on a test cluster might be = interesting before you do the sizing of the final production cluster, =0A= Regards, =0ALeo =0A On Tue, Oct 1, 2019 at 1:40 PM wrote: =0AHi,=0AWe want to use Cass= andra to store camera detection. The size of a picture + data is about 0.= 5MB. We starting with 5 devices, but we targeting 50 device for the next = year, and could go up to 1000. I summary everything ,=0A * Number of sou= rces: 5 - 50 - 1000 (src) =0A * Frequency of data: 1Hz (f) =0A * Esti= mate size of data: 0.5MB (s) =0A * Replication factor: 3 (RF) =0AI cal= culated the size per year,=0A * src * f *60 * 60 * 24 * 365 * s=0Agives m= e,=0A * 5 sources =3D 0.24 PB per year =0A * 50 sources =3D 2.4 PB per ye= ar =0A * 1000 sources =3D 47.3 per year =0Aso if respect the 2TB rule, I = got, 120 nodes in the simplest case (5 sources). Am I right ?=0A=0Aregard= s,=0ANicolas J=C3=A4ger ----=_RainLoop_445_400940626.1569943855 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable
H= i L=C3=A9o thax for the links,

Is that the size of the uncompressed da= ta or the data once it has been inserted and compressed by cassandra ?

The size of 0.5MB is the size of the data we sent, b= efore cassandra do compression if any.
Looking at the cassandra compression : http://cassandra.apache.org/doc/latest/operating/compression.html an= d testing different parameters on a test cluster might be interesting bef= ore you do the sizing of the final production cluster,
= We are in dev phase, we have two small clusters. I haven't yet take in ac= count the compression. For the compaction I roughly considered that we ne= ed 50% extra space per node (the extra space is not in the calculation I = did in my last email).



1 octobre 2019 08:58 "L=C3=A9o FERL= IN SUTTON" <lferlin@mailjet.com.invalid> a =C3=A9cri= t:
Hi !
I'm n= ot an expert but don't forget that cassandra needs space to do it's compa= ctions.
Take a look at the worst case scenarios fr= om this datastax grid : htt= ps://docs.datastax.com/en/dse-planning/doc/planning/capacityPlanning.html= #capacityPlanning__disk
> The size of a picture + dat= a is about 0.5MB
Is that the size of the uncompressed= data or the data once it has been inserted and compressed by cassandra ?=
Looking at the cassa= ndra compression : http://cassandra.apache.org/doc/lat= est/operating/compression.html and testing different parameters on a = test cluster might be interesting before you do the sizing of the final p= roduction cluster,
Regards,
Leo
On Tue, Oct 1, 2019 at 1:40 PM &= lt;jagernicolas@legtux.= org> wrote:
Hi,
We want to use = Cassandra to store camera detection. The size of a picture + data is abou= t 0.5MB. We starting with 5 devices, but we targeting 50 device for the n= ext year, and could go up to 1000. I summary everything ,
  • = Number of sources: 5 - 50 - 1000 (src)
  • Frequency of data: 1Hz (f)
  • Estima= te size of data: 0.5MB (s)
  • Replicati= on factor: 3 (RF)

I calculated the si= ze per year,
  • src * f *60 * 60 * 24 * 365 * s

gives m= e,
  • 5 sources =3D 0.24 PB per year
  • 50 sources =3D 2.4 PB = per year
  • 1000 sources =3D 47.3 per year

so if res= pect the 2TB rule, I got, 120 nodes in the simplest case (5 sources). Am = I right ?

regards,
Nicolas J=C3=A4ger
=


----=_RainLoop_445_400940626.1569943855--