Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 45717 invoked from network); 20 Apr 2010 21:06:28 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 Apr 2010 21:06:28 -0000 Received: (qmail 29329 invoked by uid 500); 20 Apr 2010 21:06:27 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 29314 invoked by uid 500); 20 Apr 2010 21:06:27 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 29306 invoked by uid 99); 20 Apr 2010 21:06:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Apr 2010 21:06:27 +0000 X-ASF-Spam-Status: No, hits=-0.6 required=10.0 tests=AWL,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tsuraan@gmail.com designates 209.85.212.44 as permitted sender) Received: from [209.85.212.44] (HELO mail-vw0-f44.google.com) (209.85.212.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Apr 2010 21:06:21 +0000 Received: by vws10 with SMTP id 10so629234vws.31 for ; Tue, 20 Apr 2010 14:06:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:content-type :content-transfer-encoding; bh=sQ/uqtvekcOAozX22Qam9AkTF3jwmxTxIo3mVjWoL7s=; b=x4tQ1CXESIcTXrulDjCcGGPdnZJWH0QPyz07B+xzVZmML0SSdHW9mFX1LIBvXBdVTF BZWtgZEl5WzqK7pBS4vc9vHQvogCBhB3C66E9xDzubBYGYFOG8bi/BhOMejfL/0uwnHo 81JY9/6dAuwus4jRdruxs3zaw6y+9W8TyShRY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=ZJRLdFiBtyZcfA6fiQ1v+BoyyLN2IfZsyN3nQ8tXNnyqmDrjzsvbjeYvEJX9M7oiNT pF5G1V6oXaJdll4t/aBSWkG88d29ZAcSEqTs6Wxr2tlsfZVYtqKBu2NrIy7dJtKzK/Eg 0bdJsFLuuzHylFY6hl8WYrGoAWwaOEbcTHMSo= MIME-Version: 1.0 Received: by 10.220.172.204 with HTTP; Tue, 20 Apr 2010 14:06:00 -0700 (PDT) In-Reply-To: References: <001636b2adb13a709f0484ae5408@google.com> Date: Tue, 20 Apr 2010 16:06:00 -0500 Received: by 10.220.107.158 with SMTP id b30mr4998446vcp.225.1271797560261; Tue, 20 Apr 2010 14:06:00 -0700 (PDT) Message-ID: Subject: Re: Re: Modelling assets and user permissions From: tsuraan To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable > It seems to me you might get by with putting the actual assets into > cassandra (possibly breaking them up into chunks depending on how big > they are) and storing the pointers to them in Postgres along with all > the other metadata. =A0If it were me, I'd split each file into a fixed > chunksize and store it using its SHA1 checksum, and keep an ordered > list of chunks that make up a file, then never delete a chunk. =A0Given > billions of documents you just may end up with some savings due to > file chunks that are identical. The retrieval of documents is pretty key (people like getting their files), so we store them on disk and use our http server's static file serving to send them out. I'm not sure what the best way to serve files stored in cassandra would be, but the free replication offered is interesting. Is cassandra a sane way to store huge amounts (many TB) of raw data? I saw in the limitations page that people are using cassandra to store files, but is it considered a good idea? > You could partition the postgres tables and replicate the data to a > handful of read-only nodes that could handle quite a bit of the work. > I suppose it depends on your write-frequency how that might pan out as > a scalability option. Our system is pretty write-heavy; we currently do a bit under a million files a day (which translates to about 5x number of db records stored), but we're going for a few million per day. Here's a quick question that should be answerable: If I have a CF with SuperColumns where one of the SuperColumns has keys that are users allowed to see an asset, is it guaranteed to be safe to add keys to that SuperColumn? I noticed that each column has its own timestamp, so it doesn't look like I actually need to write a full row (which would introduce overwriting race-condition concerns). It looks like I can just use batch_mutate to add the keys that I want to the permissions SuperColumn. Is that correct, and would that avoid races?