Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of avinash.lakshman@gmail.com
 designates 209.85.211.195 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=FKsySXAfgSAbsmryE8VXE5nkQzYPuj8e6YLbBj3TkUp37jCd9oi8okyyX7DkgnUzcg
         6bmWr+qfSxzNl3xliJf4k88KJSAx+7uJ3aaBCGm+pV1Qz29kwpipbNu3kClzZ5Rcg4sp
         vIAHtXeepe1Wyw6r6k9hTY4GRoCO7k8UeKljc=
MIME-Version: 1.0
In-Reply-To: <g2z50db8f51004141915g5111345r7245e40ca00d1e58@mail.gmail.com>
References: <n2u50db8f51004141842i3d7b4637y1f8c1f2a3abcba32@mail.gmail.com>
	 <q2p5f7770581004141908ge4ac882gcdfede9ad845fc6@mail.gmail.com>
	 <g2z50db8f51004141915g5111345r7245e40ca00d1e58@mail.gmail.com>
Date: Wed, 14 Apr 2010 19:25:11 -0700
Message-ID: <n2ua06de5521004141925q7f33688ag95f0341b38134dd3@mail.gmail.com>
Subject: Re: Is that possible to write a file system over Cassandra?
From: Avinash Lakshman <avinash.lakshman@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=000e0cd5cf5846d12804843d3480

--000e0cd5cf5846d12804843d3480
Content-Type: text/plain; charset=ISO-8859-1

Exactly. You can split a file into blocks of any size and you can actually
distribute the metadata across a large set of machines. You wouldn't have
the issue of having small files in this approach. The issue maybe the
eventual consistency - not sure that is a paradigm that would be acceptable
for a file system. But that is a discussion for another time/day.

Avinash

On Wed, Apr 14, 2010 at 7:15 PM, Ken Sandney <blueflycn@gmail.com> wrote:

> Large files can be split into small blocks, and the size of block can be
> tuned. It may increase the complexity of writing such a file system, but can
> be for general purpose (not only for relative small files)
>
>
> On Thu, Apr 15, 2010 at 10:08 AM, Tatu Saloranta <tsaloranta@gmail.com>wrote:
>
>> On Wed, Apr 14, 2010 at 6:42 PM, Zhuguo Shi <blueflycn@gmail.com> wrote:
>> > Hi,
>> > Cassandra has a good distributed model: decentralized, auto-partition,
>> > auto-recovery. I am evaluating about writing a file system over
>> Cassandra
>> > (like CassFS: http://github.com/jdarcy/CassFS ), but I don't know if
>> > Cassandra is good at such use case?
>>
>> It sort of depends on what you are looking for. From use case for
>> which something like S3 is good, yes, except with one difference:
>> Cassandra is more geared towards lots of small files, whereas S3 is
>> more geared towards moderate number of files (possibly large).
>>
>> So I think it can definitely be a good use case, and I may use
>> Cassandra for this myself in future. Having range queries allows
>> implementing directory/path structures (list keys using path as
>> prefix). And you can split storage such that metadata could live in
>> OPP partition, raw data in RP.
>>
>> -+ Tatu +-
>>
>
>

--000e0cd5cf5846d12804843d3480
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Exactly. You can split a file into blocks of any size and you can actually =
distribute the metadata across a large set of machines. You wouldn&#39;t ha=
ve the issue of having small files in this approach. The issue maybe the ev=
entual consistency - not sure that is a paradigm that would be acceptable f=
or a file system. But that is a discussion for another time/day.<br>
<br>Avinash<br><br><div class=3D"gmail_quote">On Wed, Apr 14, 2010 at 7:15 =
PM, Ken Sandney <span dir=3D"ltr">&lt;<a href=3D"mailto:blueflycn@gmail.com=
">blueflycn@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_q=
uote" style=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0=
pt 0.8ex; padding-left: 1ex;">
Large files can be split into small blocks, and the size of block can be tu=
ned. It may increase the complexity of writing such a file system, but can =
be for general purpose (not only for relative small files)<div><div></div>
<div class=3D"h5"><br><br><div class=3D"gmail_quote">
On Thu, Apr 15, 2010 at 10:08 AM, Tatu Saloranta <span dir=3D"ltr">&lt;<a h=
ref=3D"mailto:tsaloranta@gmail.com" target=3D"_blank">tsaloranta@gmail.com<=
/a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"border-=
left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left=
: 1ex;">

<div><div></div><div>On Wed, Apr 14, 2010 at 6:42 PM, Zhuguo Shi &lt;<a hre=
f=3D"mailto:blueflycn@gmail.com" target=3D"_blank">blueflycn@gmail.com</a>&=
gt; wrote:<br>
&gt; Hi,<br>
&gt; Cassandra has a good distributed model: decentralized, auto-partition,=
<br>
&gt; auto-recovery. I am evaluating about writing a file system over Cassan=
dra<br>
&gt; (like CassFS:=A0<a href=3D"http://github.com/jdarcy/CassFS" target=3D"=
_blank">http://github.com/jdarcy/CassFS</a>=A0), but I don&#39;t know if<br=
>
&gt; Cassandra is good at such use case?<br>
<br>
</div></div>It sort of depends on what you are looking for. From use case f=
or<br>
which something like S3 is good, yes, except with one difference:<br>
Cassandra is more geared towards lots of small files, whereas S3 is<br>
more geared towards moderate number of files (possibly large).<br>
<br>
So I think it can definitely be a good use case, and I may use<br>
Cassandra for this myself in future. Having range queries allows<br>
implementing directory/path structures (list keys using path as<br>
prefix). And you can split storage such that metadata could live in<br>
OPP partition, raw data in RP.<br>
<font color=3D"#888888"><br>
-+ Tatu +-<br>
</font></blockquote></div><br>
</div></div></blockquote></div><br>

--000e0cd5cf5846d12804843d3480--