Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7FF2060A9 for ; Wed, 22 Jun 2011 12:43:54 +0000 (UTC) Received: (qmail 5568 invoked by uid 500); 22 Jun 2011 12:43:50 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 5520 invoked by uid 500); 22 Jun 2011 12:43:50 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 5477 invoked by uid 99); 22 Jun 2011 12:43:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jun 2011 12:43:50 +0000 X-ASF-Spam-Status: No, hits=2.8 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of picard.damien@gmail.com designates 209.85.215.172 as permitted sender) Received: from [209.85.215.172] (HELO mail-ey0-f172.google.com) (209.85.215.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jun 2011 12:43:43 +0000 Received: by eye13 with SMTP id 13so268831eye.31 for ; Wed, 22 Jun 2011 05:43:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=kXYBCS7CUS5Z0ncMHesQ+DlRlBsOHj8u0csnakhsImE=; b=o9Rhm3dHCFNYvcL2p9Nm61M71oIyu50ZU5R55QVn7vV2IjUZMnrSbGe5T1W/Mn664c 4gNisqYWYXo+pqZXcJNVgSuqjIWn3e7oL3YOfm/1arDnXfw5Jxwwt+I3Tg+hEtm9Am55 vTlJ2fxit4v80kqKcJoJYKN94BP3QIYJaZpEE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=G5iq1Qgf/tJy5vC+EEhN3IcqhKSNbB9dYmLvzQe4Xa9MS/An10tZiovP5KiHMYcjji 9BVlKx9QGNkXS1BP5qC+jQWcujUeaigOPchR/DP+es0DxoUJhjI9JWHKRV9X+fLFZNiO D/J2i7T8atVIhkCBokm3K43dTAMjZtpOo/JTI= MIME-Version: 1.0 Received: by 10.14.15.83 with SMTP id e59mr515095eee.174.1308746603172; Wed, 22 Jun 2011 05:43:23 -0700 (PDT) Received: by 10.213.22.194 with HTTP; Wed, 22 Jun 2011 05:43:23 -0700 (PDT) In-Reply-To: References: <34962AF0-C72F-4CD3-A467-95E2F5F8304C@thelastpickle.com> Date: Wed, 22 Jun 2011 14:43:23 +0200 Message-ID: Subject: Re: Storing files in blob into Cassandra From: Damien Picard To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e65aeabe62148604a64c505b X-Virus-Checked: Checked by ClamAV on apache.org --0016e65aeabe62148604a64c505b Content-Type: text/plain; charset=ISO-8859-1 In this case, the load balancer has to detect (or is configured) that the server is down and does not route request to this one anymore. 2011/6/22 aaron morton > If the Cassandra JVM is down, Tomcat and Httpd will continue to handle > requests. And Pelops will redirect these requests to another Cassandra node > on another server (maybe am I wrong with this assertion). > >> > I was thinking of the server been turned off / broken / rebooting / > disconnected from the network / taken out of rotation for maintenance. There > are lots of reasons for a server to not be doing what it should be. > > > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 22 Jun 2011, at 23:10, Damien Picard wrote: > > > > 2011/6/22 aaron morton > >> I think I have to detail my configuration. On every server of my cluster, >> I deploy : >> - a Cassandra node >> - a Tomcat instance >> - the webapp, deployed on Tomcat >> - Apache httpd, in front of Tomcat with mod_jakarta >> >> >> You will have a bunch of services on the machine competing with each other >> for resources (cpu, memory and network IO). It's not an approach I would >> take. >> >> You will also tightly couple the front end HTTP capacity to the DB >> capacity. e.g. consider what happens when a cassandra node is down for a >> while, what does this mean for your ability to accept http connections? >> > If the Cassandra JVM is down, Tomcat and Httpd will continue to handle > requests. And Pelops will redirect these requests to another Cassandra node > on another server (maybe am I wrong with this assertion). > >> >> Requests from your web app may go to the local cassandra node, but thats >> just the coordinator. They will be forwarded onto the replicas that contain >> the data. >> > Yes, but as you notice before, this node can be down, so I will configure > Pelops to redistribute requests on another node. So there is no strong > couple between Cassandra and Tomcat ; It will works as if they was on > different servers. > >> >> Data are stored with RandomPartitionner, replication factor is 2. >> >> >> RF 3 is the minimum RF you need to use for QUORUM to be less than the RF. >> > Thank you for this advice ; I will reconsider the RF, but for this time, I > use only CL.ONE, not QUORUM. But it could change in a near future. > >> >> In such case, do you advise me to store files in Cassandra ? >> >> >> Depends on your scale, workload and performance requirements. I would do >> some tests about how much data you expect to hold and what sort of workloads >> you need to support. Personally I think files are best kept in a file >> system, until a compelling reason is found to do other wise. >> > Thank you, I think that distributing files in the cluster with something > like distributed file systems is a compelling reason to store files on > Cassandra. I don't want to add another complex component to my arch. > >> >> Hope that helps. >> > > It does ! A lot ! Thank you. > >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 22 Jun 2011, at 20:23, Damien Picard wrote: >> >> >store your images / documents / etc. somewhere and reference them >> >in Cassandra. That's the consensus that's been bandied about on this >> >list quite frequently >> >> Thank you for your answers. >> >> I think I have to detail my configuration. On every server of my cluster, >> I deploy : >> - a Cassandra node >> - a Tomcat instance >> - the webapp, deployed on Tomcat >> - Apache httpd, in front of Tomcat with mod_jakarta >> >> In front of these, I use a Round-Robin DNS load balancer which balance >> request on every httpd. >> Every Tomcat instance can access every Cassandra node, allowing them to >> deal with every request. >> Data are stored with RandomPartitionner, replication factor is 2. >> >> In my case, it would be very easy to store images in Cassandra because >> these images will be accessible everywhere in my cluster. If I store images >> in FileSystem, I have to replicate them manually (probably with a >> distributed filesystem) on every server (quite complicated). This is why I >> prefer to store files into Cassandra. >> >> According to Sylvain, the main thing to know is the max size of a file. In >> so far as I am on a web purpose, I can define this max file size to 10 Mb >> (HTTP POST max size) without disapointing my users.Furthermore, most of >> these files will not exceed 2 or 3 Mb. In such case, do you advise me to >> store files in Cassandra ? >> >> Thank you. >> >> 2011/6/22 Sylvain Lebresne >> >>> Let's be more precise in saying that this all depends on the >>> expected size of the documents. If you know that the documents >>> will be on the few hundreds kilobytes mark on average and >>> no more than a few megabytes (say < 5MB, even though there is >>> no magic number), then storing them as blob will work perfectly >>> fine (which is not saying storing them externally with metadata in >>> Cassandra won't, but using blobs can be simpler in some cases). >>> >>> I've very successfully stored tons of images as blobs in Cassandra. >>> I just knew they couldn't get super big because the system wasn't >>> allowing it. >>> >>> The point with the size being that each time you will get a document, >>> Cassandra will have to load it (entirely) in memory to return it. >>> >>> -- >>> Sylvain >>> >>> >>> On Wed, Jun 22, 2011 at 9:22 AM, Sasha Dolgy wrote: >>> > >>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Storing-photos-images-docs-etc-td6078278.html >>> > >>> > Of significance from that link (which was great until feeling lucky >>> > was removed...): >>> > >>> > Google of terms cassandra large files + feeling lucky >>> > >>> http://www.google.com/search?q=cassandra+large+files&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a >>> > >>> > Yields: >>> > http://wiki.apache.org/cassandra/FAQ#large_file_and_blob_storage >>> > >>> > >>> > --- store your images / documents / etc. somewhere and reference them >>> > in Cassandra. That's the consensus that's been bandied about on this >>> > list quite frequently. we employ a solution that uses Amazon S3 for >>> > storage and Cassandra as the reference to the meta data and location >>> > of the files. works a treat >>> > >>> > >>> > On Wed, Jun 22, 2011 at 9:07 AM, Damien Picard < >>> picard.damien@gmail.com> wrote: >>> >> Hi, >>> >> >>> >> I have to store some files (Images, documents, etc.) for my users in a >>> >> webapp. I use Cassandra for all of my data and I would like to know if >>> this >>> >> is a good idea to store these files into blob on a Cassandra CF ? >>> >> Is there some contraindications, or special things to know to achieve >>> this ? >>> >> >>> >> Thank you >>> > >>> >> >> >> >> -- >> Damien Picard >> Axeiya Services : http://axeiya.com/ >> gwt-ckeditor : http://code.google.com/p/gwt-ckeditor/ >> Mon livre sur GWT : http://axeiya.com/index.php/ouvrage-gwt.html >> >> >> > > > -- > Damien Picard > Axeiya Services : http://axeiya.com/ > gwt-ckeditor : http://code.google.com/p/gwt-ckeditor/ > Mon livre sur GWT : http://axeiya.com/index.php/ouvrage-gwt.html > > > -- Damien Picard Axeiya Services : http://axeiya.com/ gwt-ckeditor : http://code.google.com/p/gwt-ckeditor/ Mon livre sur GWT : http://axeiya.com/index.php/ouvrage-gwt.html --0016e65aeabe62148604a64c505b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable In this case, the load balancer has to detect (or is configured) that the s= erver is down and does not route request to this one anymore.

2011/6/22 aaron morton <aaron@thelastpickle.com><= br>
If the Cassandra JVM is down, Tomcat and Httpd will continue to handle= requests. And Pelops will redirect these requests to another Cassandra nod= e on another server (maybe am I wrong with this assertion).

I was thinking of the server been turned off / broken / rebooting / di= sconnected from the network / taken out of rotation for maintenance. There = are lots of reasons for a server to not be doing what it should be.=A0


-----------------
Aaron Morton
Freelance Cassandra Developer
@aaron= morton

On 22 Jun 2011, at 23:10, Damien Picard wrote:



2011/6/22 aaron mor= ton <aaron@thelastpickle.com>
I= think I have to detail my configuration. On every server of my cluster, I = deploy :
=A0- a Cassandra node
=A0- a Tomcat instance
=A0- the web= app, deployed on Tomcat
=A0- Apache httpd, in front of Tomcat with mod_jakarta

You will have a bunch of services on the machine competing wit= h each other for resources (cpu, memory and network IO). It's not an ap= proach I would take.=A0

You will also tightly couple the front end HTTP capacit= y to the DB capacity. e.g. consider what happens when a cassandra node is d= own for a while, what does this mean for your ability to accept http connec= tions?
If the Cassandra JVM is down, Tomcat and Httpd will= continue to handle requests. And Pelops will redirect these requests to an= other Cassandra node on another server (maybe am I wrong with this assertio= n).
=A0
Requests from your web app m= ay go to the local cassandra node, but thats just the coordinator. They wil= l be forwarded onto the replicas that contain the data. =A0
Yes, but as you notice before, this node can be dow= n, so I will configure Pelops to redistribute requests on another node. So = there is no strong couple between Cassandra and Tomcat ; It will works as i= f they was on different servers.

Data are stored with RandomPartitionner, replication factor is 2.
RF 3 is the minimum RF you need to use for QUORUM t= o be less than the RF.=A0
Thank you for this a= dvice ; I will reconsider=A0 the RF, but for this time, I use only CL.ONE, = not QUORUM. But it could change in a near future.

In such case, do you advise me to store files in Cassandra ?
Depends on your scale, workload and performance requirem= ents. I would do some tests about how much data you expect to hold and what= sort of workloads you need to support. =A0Personally I think files are bes= t kept in a file system, until a compelling reason is found to do other wis= e.=A0
Thank you, I think that distributing files in the c= luster with something like distributed file systems is a compelling reason = to store files on Cassandra. I don't want to add another complex compon= ent to my arch.

Hope that helps.=A0

It does ! A lot ! Thank you.
-----------------
Aaron Morton
Freelance Cass= andra Developer
@aaronmorton

On 22 Jun 2011, at 20:23, Damien Picard wrote:

>store your images / documents / etc. somewhere and= reference them
>in Cassandra. =A0That's the consensus that's been bandied about= on this
>list quite frequently

Thank you for your answers.

I think I have to detail my configuration. On every server of my cluster, I= deploy :
=A0- a Cassandra node
=A0- a Tomcat instance
=A0- the webapp, deployed on Tomcat
=A0- Apache httpd, in front of Tomcat with mod_jakarta

In front of these, I use a Round-Robin DNS load balancer which balance requ= est on every httpd.
Every Tomcat instance can access every Cassandra node, allowing them to dea= l with every request.
Data are stored with RandomPartitionner, replication factor is 2.

In my case, it would be very easy to store images in Cassandra because=20 these images will be accessible everywhere in my cluster. If I store=20 images in FileSystem, I have to replicate them manually (probably with a distributed filesystem) on every server (quite complicated). This is=20 why I prefer to store files into Cassandra.

According to Sylvain, th= e main thing to know is the max size of a file. In so far as I am on a web = purpose, I can define this max file size to 10 Mb (HTTP POST max size) with= out disapointing my users.Furthermore, most of these files will not exceed = 2 or 3 Mb. In such case, do you advise me to store files in Cassandra ?

Thank you.

2011/6/22 Sylvain Lebresne= <sylvain@datastax.com>
Let's be more precise in saying that this all depends on the
expected size of the documents. If you know that the documents
will be on the few hundreds kilobytes mark on average and
no more than a few megabytes (say < 5MB, even though there is
no magic number), then storing them as blob will work perfectly
fine (which is not saying storing them externally with metadata in
Cassandra won't, but using blobs can be simpler in some cases).

I've very successfully stored tons of images as blobs in Cassandra.
I just knew they couldn't get super big because the system wasn't allowing it.

The point with the size being that each time you will get a document,
Cassandra will have to load it (entirely) in memory to return it.

--
Sylvain


On Wed, Jun 22, 2011 at 9:22 AM, Sasha Dolgy <sdolgy@gmail.com> wrote:
> http= ://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Storing-photos= -images-docs-etc-td6078278.html
>
> Of significance from that link (which was great until feeling lucky > was removed...):
>
> Google of terms cassandra large files + feeling lucky
> http://www.google.com/search?q=3Dc= assandra+large+files&ie=3Dutf-8&oe=3Dutf-8&aq=3Dt&rls=3Dorg= .mozilla:en-US:official&client=3Dfirefox-a
>
> Yields:
> http://wiki.apache.org/cassandra/FAQ#large_file_an= d_blob_storage
>
>
> --- store your images / documents / etc. somewhere and reference them<= br> > in Cassandra. =A0That's the consensus that's been bandied abou= t on this
> list quite frequently. =A0we employ a solution that uses Amazon S3 for=
> storage and Cassandra as the reference to the meta data and location > of the files. =A0works a treat
>
>
> On Wed, Jun 22, 2011 at 9:07 AM, Damien Picard <picard.damien@gmail.com> w= rote:
>> Hi,
>>
>> I have to store some files (Images, documents, etc.) for my users = in a
>> webapp. I use Cassandra for all of my data and I would like to kno= w if this
>> is a good idea to store these files into blob on a Cassandra CF ?<= br> >> Is there some contraindications, or special things to know to achi= eve this ?
>>
>> Thank you
>



--
Damien Pica= rd
Axeiya Services : ht= tp://axeiya.com/
gwt-ckeditor : http://code.google.com/p/gwt-ckeditor/=
Mon livre sur GWT : http://axeiya.com/index.php/ouvrage-gwt.html




--
Damien Picard
Axeiya Services : http://axeiya.com/
gwt-ckeditor : http://code= .google.com/p/gwt-ckeditor/
Mon livre sur GWT : http://axeiya.com/index.php/ouvrage-gwt.html




--
Damien Picard
Axeiya Services : http://axeiya.com/
gwt-ckeditor : http://code= .google.com/p/gwt-ckeditor/
Mon livre sur GWT : http://axeiya.com/index.php/ouvrage-gwt.html
--0016e65aeabe62148604a64c505b--