Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 53761 invoked from network); 26 Apr 2010 19:02:16 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 26 Apr 2010 19:02:16 -0000 Received: (qmail 55330 invoked by uid 500); 26 Apr 2010 19:02:15 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 55281 invoked by uid 500); 26 Apr 2010 19:02:15 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 55273 invoked by uid 99); 26 Apr 2010 19:02:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Apr 2010 19:02:15 +0000 X-ASF-Spam-Status: No, hits=4.4 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of roland237@googlemail.com designates 209.85.219.225 as permitted sender) Received: from [209.85.219.225] (HELO mail-ew0-f225.google.com) (209.85.219.225) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Apr 2010 19:02:09 +0000 Received: by ewy25 with SMTP id 25so4342736ewy.27 for ; Mon, 26 Apr 2010 12:01:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type; bh=D7U4aNUnAVpTt+I95cdAFhCBMqkXBCfR8/d0qUUy1Aw=; b=PAnJfH/K0cU00Gei3WlEt1cqsIr/TZPNX+TSGJ/Fi9WRs/S2d3kuVmsp+oN1jeUcKm IKPzChZ+AYvzXmxhKK0o/T3u6jrVULtHHRHZa4KG5wN5JcRZ6OtKtBGrFi4I54UMhgRB uPboRqwgRSmOAURpp3cW/NDwfwE0d+xDpUBzk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=lcRKu2VUl5wrNLMARRTf2BUJHTBVSktdV6KpateZW3YCeq5IXmXULPcyz1XtBb1o12 PzIi7vPfuV2ma24qfiOkopS41lAHAAjwzIqIQ1o1Yw6xhlHjiDKoUgZYKaw/7fjAHmMu z6az/bmL82RP4+59Tv9WjdV+Dm2TxKjgymnvY= MIME-Version: 1.0 Received: by 10.103.7.28 with SMTP id k28mr2569391mui.25.1272308509310; Mon, 26 Apr 2010 12:01:49 -0700 (PDT) Sender: roland237@googlemail.com Received: by 10.103.231.7 with HTTP; Mon, 26 Apr 2010 12:01:48 -0700 (PDT) In-Reply-To: References: Date: Mon, 26 Apr 2010 21:01:48 +0200 X-Google-Sender-Auth: 7468b41e5fefa7fc Message-ID: Subject: Re: Can Cassandra make real use of several DataFileDirectories? From: =?ISO-8859-1?Q?Roland_H=E4nel?= To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001636417499be200704852868ce X-Virus-Checked: Checked by ClamAV on apache.org --001636417499be200704852868ce Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hm... I understand that RAID0 would help to create a bigger pool for compactions. However, it might impact read performance: if I have several CF's (with their SSTables), random read requests for the CF files that are on separate disks will behave nicely - however if it's RAID0 then a random read on any file will create a random read on all of the hard disks. Correct? -Roland 2010/4/26 Jonathan Ellis > http://wiki.apache.org/cassandra/CassandraHardware > > On Mon, Apr 26, 2010 at 1:06 PM, Edmond Lau wrote: > > Ryan - > > > > You (or maybe someone else) mentioned using RAID-0 instead of multiple > > data directories at the Cassandra hackathon as well. Could you > > explain the motivation behind that? > > > > Thanks, > > Edmond > > > > On Mon, Apr 26, 2010 at 9:53 AM, Ryan King wrote: > >> I would recommend using RAID-0 rather that multiple data directories. > >> > >> -ryan > >> > >> 2010/4/26 Roland H=E4nel : > >>> I have a configuration like this: > >>> > >>> > >>> /storage01/cassandra/data > >>> /storage02/cassandra/data > >>> /storage03/cassandra/data > >>> > >>> > >>> After loading a big chunk of data into cassandra, I end up wich some > 70GB in > >>> the first directory, and only about 10GB in the second and third one. > All > >>> rows are quite small, so it's not just some big rows that contain the > >>> majority of data. > >>> > >>> Does Cassandra have the ability to 'see' the maximum available space = in > >>> these directory? I'm asking myself this question since my limit is > 100GB, > >>> and the first directory is approaching this limit... > >>> > >>> And, wouldn't it be better if Cassandra tried to 'load-balance' the > files > >>> inside the directories because this will result in better (read) > performance > >>> if the directories are on different disks (which is the case for me)? > >>> > >>> Any help is appreciated. > >>> > >>> Roland > >>> > >>> > >> > > > --001636417499be200704852868ce Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hm... I understand that RAID0 would help to create a bigger pool for compac= tions. However, it might impact read performance: if I have several CF'= s (with their SSTables), random read requests for the CF files that are on = separate disks will behave nicely - however if it's RAID0 then a random= read on any file will create a random read on all of the hard disks. Corre= ct?

-Roland

2010/4/26 Jonathan Ellis <jbellis@gmail.com>
http://wiki.apache.org/cassandra/CassandraHardware

On Mon, Apr 26, 2010 at 1:06 PM, Edmond Lau <edmond@ooyala.com> wrote:
> Ryan -
>
> You (or maybe someone else) mentioned using RAID-0 instead of multiple=
> data directories at the Cassandra hackathon as well. =A0Could you
> explain the motivation behind that?
>
> Thanks,
> Edmond
>
> On Mon, Apr 26, 2010 at 9:53 AM, Ryan King <ryan@twitter.com> wrote:
>> I would recommend using RAID-0 rather that multiple data directori= es.
>>
>> -ryan
>>
>> 2010/4/26 Roland H=E4nel <r= oland@haenel.me>:
>>> I have a configuration like this:
>>>
>>> =A0 <DataFileDirectories>
>>> =A0=A0=A0=A0=A0 <DataFileDirectory>/storage01/cassandra/= data</DataFileDirectory>
>>> =A0=A0=A0=A0=A0 <DataFileDirectory>/storage02/cassandra/= data</DataFileDirectory>
>>> =A0=A0=A0=A0=A0 <DataFileDirectory>/storage03/cassandra/= data</DataFileDirectory>
>>> =A0 </DataFileDirectories>
>>>
>>> After loading a big chunk of data into cassandra, I end up wic= h some 70GB in
>>> the first directory, and only about 10GB in the second and thi= rd one. All
>>> rows are quite small, so it's not just some big rows that = contain the
>>> majority of data.
>>>
>>> Does Cassandra have the ability to 'see' the maximum a= vailable space in
>>> these directory? I'm asking myself this question since my = limit is 100GB,
>>> and the first directory is approaching this limit...
>>>
>>> And, wouldn't it be better if Cassandra tried to 'load= -balance' the files
>>> inside the directories because this will result in better (rea= d) performance
>>> if the directories are on different disks (which is the case f= or me)?
>>>
>>> Any help is appreciated.
>>>
>>> Roland
>>>
>>>
>>
>

--001636417499be200704852868ce--