Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 7844 invoked from network); 28 Apr 2010 05:49:59 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 28 Apr 2010 05:49:59 -0000 Received: (qmail 44402 invoked by uid 500); 28 Apr 2010 05:49:58 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 44388 invoked by uid 500); 28 Apr 2010 05:49:58 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 44380 invoked by uid 99); 28 Apr 2010 05:49:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Apr 2010 05:49:58 +0000 X-ASF-Spam-Status: No, hits=3.0 required=10.0 tests=FREEMAIL_FROM,HK_RANDOM_ENVFROM,HK_RANDOM_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of zjffdu@gmail.com designates 209.85.160.44 as permitted sender) Received: from [209.85.160.44] (HELO mail-pw0-f44.google.com) (209.85.160.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Apr 2010 05:49:50 +0000 Received: by pwj2 with SMTP id 2so10109484pwj.31 for ; Tue, 27 Apr 2010 22:49:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=lxW9H9M54loJjS7NYey80bkRFMHg99JyCe1P0MWPUQs=; b=bdR6EIkrmnVsLTE9lMnIigGL/s9tHebR6S34/NR9c4ZPIyPu39Az2aujviAY2q/oX/ OS/tociMhXIJ072MSMYEaXruDc92fkBRa93Yi+OptPJpKY4gaN5vDEECTnoobQRSvhix o2P5V7ewJ3WF8lAuGDk7U8wyIaXFovFB1r+Hk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=aIcDVfTjGsr8DAA0npz9NOnTdxBTjmMlYcB7hzkl4/iV9Uur7QQbrAJLVnhgbDho5T xSqaRHv6YLK9eisPBsJfYjNt2MfeN65msSTbaZloXY3BiBbiAiRx7tqn+iawLICPPXeE TEDM/NXyvlDSKlfOnqX2bwuE28joT5xIpMtms= MIME-Version: 1.0 Received: by 10.143.24.24 with SMTP id b24mr3562177wfj.180.1272433768676; Tue, 27 Apr 2010 22:49:28 -0700 (PDT) Received: by 10.142.211.17 with HTTP; Tue, 27 Apr 2010 22:49:28 -0700 (PDT) In-Reply-To: References: <201004260850245660216@gmail.com> Date: Wed, 28 Apr 2010 13:49:28 +0800 Message-ID: Subject: Re: how to store file in the cassandra? From: Jeff Zhang To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Mark, Thanks for your suggestion, It's really not a good idea to store one file in multiple columns in one row. The heap space problem will still exist. And I take your advice to store it in multiple rows, it works, I can event store one file with 2G. On Mon, Apr 26, 2010 at 6:12 PM, Mark Robson wrote: > On 26 April 2010 00:57, Shuge Lee wrote: >> >> In Python: >> >> keyspace.columnfamily[key][column] =3D value >> >> files.video[uuid.uuid4()]['name'] =3D 'foo.flv' >> files.video[uuid.uuid4()]['path'] =3D '/var/files/foo.flv' > > Hi. > Storing the filename in the database will not solve the file storage > problem. Cassandra is a distributed database, and a file stored locally w= ill > not be available on other client nodes. > If you're using Cassandra at all, that probably implies that you have lot= s > of client nodes. A non-redundant NFS server (for example) would not offer > high availability, so would be inadequate for the OP's situation. > Storing files *IN* Cassandra is very useful because you can then retrieve > them from anywhere with high availability. > However, as others have discussed, they should be split across multiple > columns, or if very big, multiple rows. > I prefer to split by row because this scales better to very large files. > During compaction, as is well noted, Cassandra needs the entire row in > memory, which will cause a FAIL =C2=A0once you have files more than a few= gigs. > Mark --=20 Best Regards Jeff Zhang