From user-return-64528-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org  Tue Oct  1 21:49:08 2019
Return-Path: <user-return-64528-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [207.244.88.153])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 8F7D6180608
	for <archive-asf-public@cust-asf.ponee.io>; Tue,  1 Oct 2019 23:49:08 +0200 (CEST)
Received: (qmail 61564 invoked by uid 500); 1 Oct 2019 21:49:03 -0000
Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:user-help@cassandra.apache.org>
List-Unsubscribe: <mailto:user-unsubscribe@cassandra.apache.org>
List-Post: <mailto:user@cassandra.apache.org>
List-Id: <user.cassandra.apache.org>
Reply-To: user@cassandra.apache.org
Delivered-To: mailing list user@cassandra.apache.org
Received: (qmail 61432 invoked by uid 99); 1 Oct 2019 21:49:03 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Oct 2019 21:49:03 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id C1DCFC26B7
	for <user@cassandra.apache.org>; Tue,  1 Oct 2019 21:49:02 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 0.8
X-Spam-Level:
X-Spam-Status: No, score=0.8 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	DKIM_VALID_EF=-0.1, FREEMAIL_REPLY=1, RCVD_IN_DNSWL_NONE=-0.0001,
	SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=disabled
Authentication-Results: spamd4-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=gmail.com
Received: from mx1-he-de.apache.org ([10.40.0.8])
	by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024)
	with ESMTP id 60lrSW2JBBOY for <user@cassandra.apache.org>;
	Tue,  1 Oct 2019 21:49:00 +0000 (UTC)
Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a00:1450:4864:20::233; helo=mail-lj1-x233.google.com; envelope-from=doanduyhai@gmail.com; receiver=<UNKNOWN> 
Received: from mail-lj1-x233.google.com (mail-lj1-x233.google.com [IPv6:2a00:1450:4864:20::233])
	by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 33D627DE28
	for <user@cassandra.apache.org>; Tue,  1 Oct 2019 21:49:00 +0000 (UTC)
Received: by mail-lj1-x233.google.com with SMTP id n14so14951372ljj.10
        for <user@cassandra.apache.org>; Tue, 01 Oct 2019 14:49:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :content-transfer-encoding;
        bh=/XfrfX7l7hcZimbtMq27aY7aVaN8Zrw9ZbuAqd33IXQ=;
        b=dYON63mBK1gE1o3D6f+n/8AyfxmWDRyI4+HXshggYKcQ/KlVz4ywJd0m0np8EEzkel
         wYoO+9SJYQ6nk0GkjzQKPp/JxNdrqZQtsZr2BQ4GXq76RLwhpwo6jy0AylM3tHZDYfGt
         d6jeBo9dREw8FgU/FHQG4bpWwv2xc4C6twCulG2D2Hiry2WhxTwYlGlImWKW0Ym2SL3L
         DR1fFIknWj2gyzWfKS1fRljMd69LuAteawW9aavh9a8dxSe4D2S7RqOPIljXKAt6Wyfu
         /vZGJtYsuNVzNTAlDGWt1Ox+N5jOfQWkoBKgcAUwW3wRSOrh99iEiItfdq4nYDqVjphp
         fk1Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:content-transfer-encoding;
        bh=/XfrfX7l7hcZimbtMq27aY7aVaN8Zrw9ZbuAqd33IXQ=;
        b=I4cZi2M3B2CyeieaB6qN9FD/Ed0ZRPfDrkXd/xBU9fefapwwdP+eQjnLHuld0M9X9a
         mXbwzdMHVa5RxbSiQNaoXA/uCyD6nQi/mJUM3DeBw32n0aXJ75kb9xlY3VMFumakckYl
         De21TtmqkgaafqPAy1vgQ3EF0MfJeuTlXMpvPStW25jP8M/ptbcmfLCUL23Zd+lxWpoF
         7M+xi2dqQ/1PsCdsH0mdwYF3GuFnNn96n6hvr9MpHXHnCtWYfk74aU9bZYgogJbe8tl1
         lCwwB+OGmSdFEtW+G1Pyq+XRuG7NhtHMzBhuHtYg0Minb3usa44Or7L9bST3jzePaozV
         FJeQ==
X-Gm-Message-State: APjAAAVOzgpT7AU3GZ3h9vCRGNHBYx0SLVvjgvzMpEAJ32mhgse5Hmqn
	0VTg0iSu8MGBZ3r9KrLwu2QDlQ/96JtiHlwXWryuxerw
X-Google-Smtp-Source: APXvYqyCIksK094wc7rEk2eoAZlQ9rxAtpfhfWh3/T0OGhwjZCtHUfmqtyU81TQX/rxyCDMUrBDGHYCHap+LbijZyH0=
X-Received: by 2002:a2e:2bda:: with SMTP id r87mr45956ljr.3.1569966539355;
 Tue, 01 Oct 2019 14:48:59 -0700 (PDT)
MIME-Version: 1.0
References: <CABNXB2BjcU_ESbAbJCX7DTJvEzg=FCZqg2ik0CrCAPMWT1aMrA@mail.gmail.com>
 <0D44F373-EF10-40FC-A79E-AE5C4236CD9B@gmail.com> <CAEHpzv9JWD7R=q9UmaTXmr2dnbFV34RDFBevdFtaTH3QMxpVpw@mail.gmail.com>
 <CABNXB2C6ywPSi4edGoSQUi3ROgavba+fSQc+XruKyxOJt5-0+w@mail.gmail.com>
 <CABNXB2BMA-YshkijoL9jiuSNo72KQ-PY6kywM+bcjbqyLTLkPQ@mail.gmail.com>
 <CABNXB2D6fwChM8BAziAfTx18v_Kzxu1WjHo1MfcpyaZZ076URg@mail.gmail.com>
 <CABNXB2APRf5k+t8DVLDLLLuqd9P_eKDyTEZzOHYiG91L-GJq9Q@mail.gmail.com>
 <CABNXB2DfEptiu_oGPgSQnHZXnN002vVoa4+5Ua-J9azMKVJp=g@mail.gmail.com> <CA+ZM0DFGWQSN3pCy5PqV4G0PMpEdCUL1F1y9Gc8EOycqddhRQw@mail.gmail.com>
In-Reply-To: <CA+ZM0DFGWQSN3pCy5PqV4G0PMpEdCUL1F1y9Gc8EOycqddhRQw@mail.gmail.com>
From: DuyHai Doan <doanduyhai@gmail.com>
Date: Tue, 1 Oct 2019 23:48:48 +0200
Message-ID: <CABNXB2DaR4ZLJ-metMe3ifr-a_ryKt1wzGPq=NfmE03fENh-mQ@mail.gmail.com>
Subject: Re: Cluster sizing for huge dataset
To: user <user@cassandra.apache.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

The client wants to be able to access cold data (2 years old) in the
same cluster so moving data to another system is not possible

However, since we're using Datastax Enterprise, we can leverage Tiered
Storage and store old data on Spinning Disks to save on hardware

Regards

On Tue, Oct 1, 2019 at 9:47 AM Julien Laurenceau
<julien.laurenceau@pepitedata.com> wrote:
>
> Hi,
> Depending on the use case, you may also consider storage tiering with fre=
sh data on hot-tier (Cassandra) and older data on cold-tier (Spark/Parquet =
or Presto/Parquet). It would be a lot more complex, but may fit more approp=
riately the budget and you may reuse some tech already present in your envi=
ronment.
> You may even do subsampling during the transformation offloading data fro=
m Cassandra in order to keep one point out of 10 for older data if subsampl=
ing makes sense for your data signal.
>
> Regards
> Julien
>
> Le lun. 30 sept. 2019 =C3=A0 22:03, DuyHai Doan <doanduyhai@gmail.com> a =
=C3=A9crit :
>>
>> Thanks all for your reply
>>
>> The target deployment is on Azure so with the Nice disk snapshot feature=
, replacing a dead node is easier, no streaming from Cassandra
>>
>> About compaction overhead, using TwCs with a 1 day bucket and removing r=
ead repair and subrange repair should be sufficient
>>
>> Now the only remaining issue is Quorum read which triggers repair automa=
gically
>>
>> Before 4.0  there is no flag to turn it off unfortunately
>>
>> Le 30 sept. 2019 15:47, "Eric Evans" <john.eric.evans@gmail.com> a =C3=
=A9crit :
>>
>> On Sat, Sep 28, 2019 at 8:50 PM Jeff Jirsa <jjirsa@gmail.com> wrote:
>>
>> [ ... ]
>>
>> > 2) The 2TB guidance is old and irrelevant for most people, what you re=
ally care about is how fast you can replace the failed machine
>> >
>> > You=E2=80=99d likely be ok going significantly larger than that if you=
 use a few vnodes, since that=E2=80=99ll help rebuild faster (you=E2=80=99l=
l stream from more sources on rebuild)
>> >
>> > If you don=E2=80=99t want to use vnodes, buy big machines and run mult=
iple Cassandra instances in it - it=E2=80=99s not hard to run 3-4TB per ins=
tance and 12-16T of SSD per machine
>>
>> We do this too.  It's worth keeping in mind though that you'll still
>> have a 12-16T blast radius in the event of a host failure.  As the
>> host density goes up, consider steps to make the host more robust
>> (RAID, redundant power supplies, etc).
>>
>> --
>> Eric Evans
>> john.eric.evans@gmail.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org