Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 91340 invoked from network); 23 Jul 2010 19:33:51 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 23 Jul 2010 19:33:51 -0000 Received: (qmail 67209 invoked by uid 500); 23 Jul 2010 19:33:50 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 67179 invoked by uid 500); 23 Jul 2010 19:33:49 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 67171 invoked by uid 99); 23 Jul 2010 19:33:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Jul 2010 19:33:49 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.161.44] (HELO mail-fx0-f44.google.com) (209.85.161.44) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Jul 2010 19:33:39 +0000 Received: by fxm1 with SMTP id 1so5738495fxm.31 for ; Fri, 23 Jul 2010 12:33:19 -0700 (PDT) MIME-Version: 1.0 Received: by 10.103.249.19 with SMTP id b19mr861658mus.96.1279913598737; Fri, 23 Jul 2010 12:33:18 -0700 (PDT) Sender: scode@scode.org Received: by 10.103.240.3 with HTTP; Fri, 23 Jul 2010 12:33:18 -0700 (PDT) X-Originating-IP: [213.114.153.155] In-Reply-To: References: Date: Fri, 23 Jul 2010 21:33:18 +0200 X-Google-Sender-Auth: GCfe4DoDQGd3EsX0wsW_MjRWTtY Message-ID: Subject: Re: Cassandra to store 1 billion small 64KB Blobs From: Peter Schuller To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org > We plan to use cassandra as a data storage on at least 2 nodes with RF=2 > for about 1 billion small files. > We do have about 48TB discspace behind for each node. > > now my question is - is this possible with cassandra - reliable - means > (every blob is stored on 2 jbods).. > > we may grow up to nearly 40TB or more on cassandra "storage" data ... > > anyone out did something similar? Other than what Jonathan Shook mentioned, I'd expect one potential problem to be the number of sstables. At 40 TB, the larger compactions are going to take quite some time. How many memtables will be flushed to disk during the time it takes to perform a ~ 40 TB compaction? That may or may not be an issue depending on how fast writes will happen, how large your memtables are (the bigger the better) and what your reads will look like. (This relates to another thread where I posted about concurrent compaction, but right now Cassandra only does a single compaction at a time.) -- / Peter Schuller