Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 21834 invoked from network); 8 Mar 2010 12:19:28 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 8 Mar 2010 12:19:28 -0000 Received: (qmail 72483 invoked by uid 500); 8 Mar 2010 12:19:04 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 72410 invoked by uid 500); 8 Mar 2010 12:19:04 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 72399 invoked by uid 99); 8 Mar 2010 12:19:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Mar 2010 12:19:03 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of matteo.caprari@gmail.com designates 74.125.82.175 as permitted sender) Received: from [74.125.82.175] (HELO mail-wy0-f175.google.com) (74.125.82.175) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Mar 2010 12:18:56 +0000 Received: by wyf28 with SMTP id 28so3009230wyf.6 for ; Mon, 08 Mar 2010 04:18:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:from:date:message-id :subject:to:content-type; bh=S2Uo++rP9j5GWGfNO8RS83GjThsXaOy/+6pRGhCxjK0=; b=DtI12VuP5FPs8cM3zyDudKNbDLCvY4pwMmqozAx3bw9HMD+nlelPcsRGjypBGna/Je 2bCpumWvFd5TJUrGc3UTUi+oRaihk4qY+UTp4LnozZajB7nPKcDb2Bl5v6jwr8kGdBbU 2ZPZZJozY0f9kK65z6OCsCamvOSHrISCL149k= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type; b=wKVE80lbgn43+CwVEiyE6PgR9QwTw+4wYBSUrH7MXpv1J/Jvh3skUaAKtAkbZdp6kZ 7tCXXwL25oqnwEBCjzSjwXW29ynypnhl35dqWpCSI6DRBw5YLH9jCxTgDhLD/dH11Oeo FTYi2rtt5yuZM4Soq7+GVXNj5TNx2nWU0W19w= MIME-Version: 1.0 Received: by 10.216.87.71 with SMTP id x49mr292477wee.11.1268050716174; Mon, 08 Mar 2010 04:18:36 -0800 (PST) From: Matteo Caprari Date: Mon, 8 Mar 2010 12:18:16 +0000 Message-ID: <1bca98391003080418q26ff1616o47ea6c7540a6734b@mail.gmail.com> Subject: schema design question To: cassandra-user@incubator.apache.org Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org Hi. We have a collection operation that generates documents like this: item: { "id": "", "title": "...", "liked_by": ["user_2", "user_3", ...] } The liked_by list contains on average 100 unique users. Users may also appear in other items. Our database contains a few million entries and is growing at about 1M a day. Around 10% of the incoming data is additional info about an item (ie: more likers) and a merge operation needs to be done. We are not too happy with our current system and are considering cassandra. I'm new to this kind of db, and I'd like to hear a few informed opinions on how to design a cassandra schema. Of course we wish the system to keep up with the write/update rate and answer our key queries 'as quickly as possible'. The 'key' queries are: - list all the items a user liked - list all the users that liked an item - list all users and count how many items each user liked (we need this every few hours and in fact we are only interested in the top N users that liked most stuff) Thanks! -- :Matteo Caprari matteo.caprari@gmail.com