Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 83998 invoked from network); 24 Jul 2010 07:06:11 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 24 Jul 2010 07:06:11 -0000 Received: (qmail 13112 invoked by uid 500); 24 Jul 2010 07:06:10 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 12836 invoked by uid 500); 24 Jul 2010 07:06:06 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 12828 invoked by uid 99); 24 Jul 2010 07:06:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 24 Jul 2010 07:06:05 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of michael.widmann@gmail.com designates 209.85.214.44 as permitted sender) Received: from [209.85.214.44] (HELO mail-bw0-f44.google.com) (209.85.214.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 24 Jul 2010 07:06:00 +0000 Received: by bwz7 with SMTP id 7so2321798bwz.31 for ; Sat, 24 Jul 2010 00:05:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=lDU4G7HzNELw22+/KoejiVbrkODDQVWahamHKoJPLh8=; b=OaQQTPYoCDYE00MvWQnKe3TfBSgFKTn9mtUSc1mGslBKqbRvqJWHV+ndzvXH23fgEu xzo+/dPXCS+YMJtaEq0C0S1shwkYtZxHtOvQMS30ihnherZRMZXwObvAyV/2vysSymxH LIFemJMClhE7/BcfVyIkvH8OYHsoZ1Ws7RG+c= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=b+3G09U8vn2Xt3GrH/vsrrukKtvMZyxUp2ae6/s2hEvi3HrdCR1GX0i4iI6IMEHJt4 ZBMStmA60bSO/Qa1uoq4DZeRlRRrpVTkcZlrCCT7dmJ/B8RGbr1gWEOKTsQc05KOtxAz 2cwwQn9YUKa1NmjoDxNVu0tskPNUVOBNfnMXo= MIME-Version: 1.0 Received: by 10.204.179.19 with SMTP id bo19mr3010031bkb.209.1279955132140; Sat, 24 Jul 2010 00:05:32 -0700 (PDT) Received: by 10.204.141.25 with HTTP; Sat, 24 Jul 2010 00:05:32 -0700 (PDT) In-Reply-To: References: Date: Sat, 24 Jul 2010 09:05:32 +0200 Message-ID: Subject: Re: Cassandra to store 1 billion small 64KB Blobs From: Michael Widmann To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001636c5a93cfacea9048c1cc65e --001636c5a93cfacea9048c1cc65e Content-Type: text/plain; charset=ISO-8859-1 Hi Peter We try to figure that out how much data is coming in to cassandra once in full operation mode Reads are more depending on the hash values (the file name) for the binary blobs - not the binary data itself We will try to store hash values "grouped" (based on their first byte (a-z,A-Z,0-9) writes will sometimes be very fast (depends on the workload and the clients writing to the system) Question: is concurrent compaction planned for the future? Mike 2010/7/23 Peter Schuller > > We plan to use cassandra as a data storage on at least 2 nodes with RF=2 > > for about 1 billion small files. > > We do have about 48TB discspace behind for each node. > > > > now my question is - is this possible with cassandra - reliable - means > > (every blob is stored on 2 jbods).. > > > > we may grow up to nearly 40TB or more on cassandra "storage" data ... > > > > anyone out did something similar? > > Other than what Jonathan Shook mentioned, I'd expect one potential > problem to be the number of sstables. At 40 TB, the larger compactions > are going to take quite some time. How many memtables will be flushed > to disk during the time it takes to perform a ~ 40 TB compaction? That > may or may not be an issue depending on how fast writes will happen, > how large your memtables are (the bigger the better) and what your > reads will look like. > > (This relates to another thread where I posted about concurrent > compaction, but right now Cassandra only does a single compaction at a > time.) > > -- > / Peter Schuller > -- bayoda.com - Professional Online Backup Solutions for Small and Medium Sized Companies --001636c5a93cfacea9048c1cc65e Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Peter

We try to figure that out how much data is coming in to ca= ssandra once in full operation mode

Reads are more depending on the = hash values (the file name) for the binary blobs - not the binary data itse= lf
We will try to store hash values "grouped" (based on their first = byte (a-z,A-Z,0-9)=A0
writes will sometimes be very fast (depends on th= e workload and the clients writing to the system)

Question: is concu= rrent compaction planned for the future?

Mike

2010/7/23 Peter Schuller <peter.schull= er@infidyne.com>
> We plan to use cassandra as a data storage on at lea= st 2 nodes with RF=3D2
> for about 1 billion small files.
> We do have about 48TB discspace behind for each node.
>
> now my question is - is this possible with cassandra - reliable - mean= s
> (every blob is stored on 2 jbods)..
>
> we may grow up to nearly 40TB or more on cassandra "storage"= data ...
>
> anyone out did something similar?

Other than what Jonathan Shook mentioned, I'd expect one potentia= l
problem to be the number of sstables. At 40 TB, the larger compactions
are going to take quite some time. How many memtables will be flushed
to disk during the time it takes to perform a ~ 40 TB compaction? That
may or may not be an issue depending on how fast writes will happen,
how large your memtables are (the bigger the better) and what your
reads will look like.

(This relates to another thread where I posted about concurrent
compaction, but right now Cassandra only does a single compaction at a
time.)

--
/ Peter Schuller



--
bayoda.com - Professional Online Backup Solutions for Sma= ll and Medium Sized Companies
--001636c5a93cfacea9048c1cc65e--