From user-return-64523-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Tue Oct 1 12:58:18 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id A4CC6180608 for ; Tue, 1 Oct 2019 14:58:18 +0200 (CEST) Received: (qmail 98338 invoked by uid 500); 1 Oct 2019 12:58:14 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 98328 invoked by uid 99); 1 Oct 2019 12:58:13 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Oct 2019 12:58:13 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 57C58C1CE2 for ; Tue, 1 Oct 2019 12:58:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.799 X-Spam-Level: * X-Spam-Status: No, score=1.799 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=mailjet.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id pUfH9ga6goST for ; Tue, 1 Oct 2019 12:58:11 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.166.44; helo=mail-io1-f44.google.com; envelope-from=lferlin@mailjet.com; receiver= Received: from mail-io1-f44.google.com (mail-io1-f44.google.com [209.85.166.44]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id B8DA0BC59E for ; Tue, 1 Oct 2019 12:58:10 +0000 (UTC) Received: by mail-io1-f44.google.com with SMTP id b19so19519677iob.4 for ; Tue, 01 Oct 2019 05:58:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=dqNu35/M73p9zidOYSyTLaArMuwRfYcMcRv2p2slsBU=; b=jeB7kNsDAR+mmEGzbeIN0LsGYKhncHQwPaXddS9HQuV4uBXpKefQCRCRFxKFctlRSr A+lwmg40P7kb6VcgQvoniHVgSgygJkZwa1L/H7RTL3tC26xD1BmXuXIk5B0eMdmS7lYf mNKZpoViI4ihVmYm1MsUcGCBSVNIHyfUj7pQ9rolNGC99NMAIRrCl24NMbcCF1bg/qfK 9r/cg54bo/7gIM68udeMzqII5imChE0yDjmXCGZFz9iLkDCuDFFWKcgpS4c1n1OvACxF AYyCzQBUtOVofKJa88EwDcYqg8ZC6Ijqs2cj2DJMzmDC1+2JxrsOGDE5RmJD2ljVibft m2hA== X-Gm-Message-State: APjAAAUKgkFRG7E7KsQ99m6b+oH+x6xdwPbhHZu3c/j4+EbDKIPor/if KaI2anRiwy3MK/WLQ3Mh+oBnhYA34MLELaxgzxBLBjhnnKk= X-Google-Smtp-Source: APXvYqwS6cPEvHW5r16pej7t6S+lSUeujMjXlSRop2GW0E8D4OB2BxTSvKn00H4bCxkmizhe9Hp+2SyGz0NME/8H6eI= X-Received: by 2002:a92:d587:: with SMTP id a7mr26042180iln.256.1569934689831; Tue, 01 Oct 2019 05:58:09 -0700 (PDT) MIME-Version: 1.0 References: <1b3dd531170ddc46b8123e36be3e968a@legtux.org> In-Reply-To: <1b3dd531170ddc46b8123e36be3e968a@legtux.org> From: =?UTF-8?Q?L=C3=A9o_FERLIN_SUTTON?= Date: Tue, 1 Oct 2019 14:57:58 +0200 Message-ID: Subject: Re: Sizing a cluster To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary="000000000000822c1e0593d8e711" --000000000000822c1e0593d8e711 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi ! I'm not an expert but don't forget that cassandra needs space to do it's compactions. Take a look at the worst case scenarios from this datastax grid : https://docs.datastax.com/en/dse-planning/doc/planning/capacityPlanning.htm= l#capacityPlanning__disk > The size of a picture + data is about 0.5MB Is that the size of the uncompressed data or the data once it has been inserted and compressed by cassandra ? Looking at the cassandra compression : http://cassandra.apache.org/doc/latest/operating/compression.html and testing different parameters on a test cluster might be interesting before you do the sizing of the final production cluster, Regards, Leo On Tue, Oct 1, 2019 at 1:40 PM wrote: > Hi, > We want to use Cassandra to store camera detection. The size of a picture > + data is about 0.5MB. We starting with 5 devices, but we targeting 50 > device for the next year, and could go up to 1000. I summary everything , > > - Number of sources: 5 - 50 - 1000 (src) > - Frequency of data: 1Hz (f) > - Estimate size of data: 0.5MB (s) > - Replication factor: 3 (RF) > > > I calculated the size per year, > > - src * f *60 * 60 * 24 * 365 * s > > > gives me, > > - 5 sources =3D 0.24 PB per year > - 50 sources =3D 2.4 PB per year > - 1000 sources =3D 47.3 per year > > > so if respect the 2TB rule, I got, 120 nodes in the simplest case (5 > sources). Am I right ? > > regards, > Nicolas J=C3=A4ger > --000000000000822c1e0593d8e711 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi !

I'm not an expert but don'= t forget that cassandra needs space to do it's compactions.
<= br>
Take a look at the worst case scenarios from this datastax gr= id :=C2=A0https://docs.datastax.com/en= /dse-planning/doc/planning/capacityPlanning.html#capacityPlanning__disk=

>=C2=A0The size of a picture + data is about 0.5MB

I= s that the size of the uncompressed data or the data once it has been inser= ted and compressed by cassandra ?
Looking at the cassandra compression :=C2=A0http://cassa= ndra.apache.org/doc/latest/operating/compression.html=C2=A0and testing = different parameters on a test cluster might be interesting=C2=A0before you= do the sizing of the final production cluster,

Re= gards,

Leo

On Tue, Oct 1, 2019 at 1:40 PM <= ;jagernicolas@legtux.org>= wrote:
<= div>
Hi,
We wa= nt to use Cassandra to store camera detection. The size of a picture + data= is about 0.5MB. We starting with 5 devices, but we targeting 50 device for= the next year, and could go up to 1000. I summary everything ,
  • Number of sources: 5 - 50 - 1000 (src)
  • <= span>Frequency of data: 1Hz (f)
  • Estima= te size of data: 0.5MB (s)
  • Replication= factor: 3 (RF)

I calculated the size p= er year,
  • src * f *60 * 60 * 24 * 365 * s

gives me,
    =
  • 5 sources =3D 0.24 PB per year
  • 50 sources =3D 2.4 PB per year=
  • 1000 sources =3D 47.3 per year

so if= respect the 2TB rule, I got, 120 nodes in the simplest case (5 sources). A= m I right ?

regards,
Nicolas J=C3=A4ger
--000000000000822c1e0593d8e711--