Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 11249 invoked from network); 26 Apr 2010 08:28:17 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 26 Apr 2010 08:28:17 -0000 Received: (qmail 62935 invoked by uid 500); 26 Apr 2010 08:28:16 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 62821 invoked by uid 500); 26 Apr 2010 08:28:16 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 62812 invoked by uid 99); 26 Apr 2010 08:28:16 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Apr 2010 08:28:16 +0000 X-ASF-Spam-Status: No, hits=4.4 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of roland237@googlemail.com designates 209.85.218.222 as permitted sender) Received: from [209.85.218.222] (HELO mail-bw0-f222.google.com) (209.85.218.222) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Apr 2010 08:28:09 +0000 Received: by bwz22 with SMTP id 22so10833236bwz.25 for ; Mon, 26 Apr 2010 01:27:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=50FdujTxFEkJdcEfosyDzZ7hYJh+yKlkeqgHps2CmZQ=; b=vfMcs5X+ElUn61nlDOvhSRGYCg3souoN4Ss1fPo36hxA7OqAH7QuHM9EVyaxArDvY2 ZT5Lk/wA35l0Y+uJdPrPxHzKSLq+r9F+NStZLoNx2E+OWoPqleo/svoqei/+Bh4Bnf+f vIe4M4bDXqgLP8Wqk/qx2iZ0lIc5Q9L6ONYsQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type; b=W2zNgpx+Z1QueMARaSmUphVRyh/3qwqCr5p4vHRsN3Ebse5oA1jPo5lfJmUaXmiHSs GCmoQwJOUq2sWU7zQlxUgzVIaPen5bNnxX3E6SZ8J4H7uO9uDGaILa2Nugc/twEBvMKe WL4z0YWZ7k/91mkH8iz+6//x9jmTVEwJiiDkA= MIME-Version: 1.0 Received: by 10.103.80.22 with SMTP id h22mr2015735mul.127.1272270469226; Mon, 26 Apr 2010 01:27:49 -0700 (PDT) Sender: roland237@googlemail.com Received: by 10.103.231.7 with HTTP; Mon, 26 Apr 2010 01:27:46 -0700 (PDT) Date: Mon, 26 Apr 2010 10:27:46 +0200 X-Google-Sender-Auth: 70b991e7011218e5 Message-ID: Subject: Can Cassandra make real use of several DataFileDirectories? From: =?ISO-8859-1?Q?Roland_H=E4nel?= To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e65c7c146062c604851f8d77 X-Virus-Checked: Checked by ClamAV on apache.org --0016e65c7c146062c604851f8d77 Content-Type: text/plain; charset=ISO-8859-1 I have a configuration like this: /storage01/cassandra/data /storage02/cassandra/data /storage03/cassandra/data After loading a big chunk of data into cassandra, I end up wich some 70GB in the first directory, and only about 10GB in the second and third one. All rows are quite small, so it's not just some big rows that contain the majority of data. Does Cassandra have the ability to 'see' the maximum available space in these directory? I'm asking myself this question since my limit is 100GB, and the first directory is approaching this limit... And, wouldn't it be better if Cassandra tried to 'load-balance' the files inside the directories because this will result in better (read) performance if the directories are on different disks (which is the case for me)? Any help is appreciated. Roland --0016e65c7c146062c604851f8d77 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I have a configuration like this:

=A0 <DataFileDirectories>=A0=A0=A0=A0=A0 <DataFileDirectory>/storage01/cassandra/data</Dat= aFileDirectory>
=A0=A0=A0=A0=A0 <DataFileDirectory>/storage02/c= assandra/data</DataFileDirectory>
=A0=A0=A0=A0=A0 <DataFileDirectory>/storage03/cassandra/data</Data= FileDirectory>
=A0 </DataFileDirectories>

After loading = a big chunk of data into cassandra, I end up wich some 70GB in the first di= rectory, and only about 10GB in the second and third one. All rows are quit= e small, so it's not just some big rows that contain the majority of da= ta.

Does Cassandra have the ability to 'see' the maximum available = space in these directory? I'm asking myself this question since my limi= t is 100GB, and the first directory is approaching this limit...

And, wouldn't it be better if Cassandra tried to 'load-balance'= the files inside the directories because this will result in better (read)= performance if the directories are on different disks (which is the case f= or me)?

Any help is appreciated.

Roland

--0016e65c7c146062c604851f8d77--