From user-return-33278-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Mon Apr 8 15:58:30 2013 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5B02DF0B3 for ; Mon, 8 Apr 2013 15:58:30 +0000 (UTC) Received: (qmail 38361 invoked by uid 500); 8 Apr 2013 15:58:27 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 38340 invoked by uid 500); 8 Apr 2013 15:58:27 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 38328 invoked by uid 99); 8 Apr 2013 15:58:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Apr 2013 15:58:27 +0000 X-ASF-Spam-Status: No, hits=-2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_HI,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [192.54.144.131] (HELO thsbbfxrt01p.thalesgroup.com) (192.54.144.131) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Apr 2013 15:58:22 +0000 Received: from thsbbfxrt01p.thalesgroup.com (localhost [127.0.0.1]) by localhost (Postfix) with SMTP id 4E75C164A; Mon, 8 Apr 2013 17:58:00 +0200 (CEST) X-Thales-IRT1: IRT12 X-Thales-IRT1: IRT12 From: DE VITO Dominique To: "user@cassandra.apache.org" Date: Mon, 8 Apr 2013 17:57:56 +0200 Subject: data modeling from batch_mutate point of view Thread-Topic: data modeling from batch_mutate point of view Thread-Index: Ac40cdYtVgSKnckgTnGTZIGZ4tZv2g== Message-ID: <13285_1365436680_5162E908_13285_2100_1_9C88BF562A27AA41B242B2780441926E20B180A636@THSONEA01CMS05P.one.grp> Accept-Language: fr-FR Content-Language: fr-FR X-MS-Has-Attach: X-MS-TNEF-Correlator: x-cr-hashedpuzzle: d2o= AgMx BuMA DU0p EBdh EfSo G2Lm HG+D HxaW JJlK Jl7P KFri K5M8 LJX2 LPvw LjYr;1;dQBzAGUAcgBAAGMAYQBzAHMAYQBuAGQAcgBhAC4AYQBwAGEAYwBoAGUALgBvAHIAZwA=;Sosha1_v1;7;{42F7278B-6C08-4463-9F02-7C8D38D95120};ZABvAG0AaQBuAGkAcQB1AGUALgBkAGUAdgBpAHQAbwBAAHQAaABhAGwAZQBzAGcAcgBvAHUAcAAuAGMAbwBtAA==;Mon, 08 Apr 2013 15:57:56 GMT;ZABhAHQAYQAgAG0AbwBkAGUAbABpAG4AZwAgAGYAcgBvAG0AIABiAGEAdABjAGgAXwBtAHUAdABhAHQAZQAgAHAAbwBpAG4AdAAgAG8AZgAgAHYAaQBlAHcA x-cr-puzzleid: {42F7278B-6C08-4463-9F02-7C8D38D95120} acceptlanguage: fr-FR x-pmwin-version: 3.1.0.0, Antivirus-Engine: 3.41.0, Antivirus-Data: 4.87G Content-Type: multipart/alternative; boundary="_000_9C88BF562A27AA41B242B2780441926E20B180A636THSONEA01CMS0_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_9C88BF562A27AA41B242B2780441926E20B180A636THSONEA01CMS0_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi, I have a use case that sounds like storing data associated with files. So, = I store them with the CF: rowkey =3D (folder_id, file_id) colname =3D property name (about the file corresponding to file_id) colvalue =3D property value And I have CF for "manual" indexing: rowkey =3D (folder_id, indexed value) colname =3D (timestamp, file_id) colvalue =3D "" like rowkey =3D (folder_id, note_of_5) or (folder_id, some_status) colname =3D (some_date, some_filename) colvalue =3D "" I have many CF for indexing, as I index according to different (file) prope= rties. So, one alternative design for indexing CF could be: rowkey =3D folder_id colname =3D (indexed value, timestamp, file_id) colvalue =3D "" Alternative design : * pro: same rowkey for all indexing CF =3D> **all** indexing CF could be up= dated through one batch_mutate * con: repeating "indexed value" (1er colname part) again ang again (=3D a = string up to 20c) According to pro vs con, is the alternative design more or less interesting= ? Thanks. Dominique --_000_9C88BF562A27AA41B242B2780441926E20B180A636THSONEA01CMS0_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi,

 

I have a use case that sounds like storing data associ= ated with files. So, I store them with the CF:

rowkey =3D (folder_id, file_id)

colname =3D property name (about the file correspondin= g to file_id)

colvalue =3D property value

 

And I have CF for "manual" indexing:

rowkey =3D (folder_id, indexed value)

colname =3D (timestamp, file_id)

colvalue =3D ""

 

like

rowkey =3D (folder_id, note_of_5) or (folder_id, some_= status)

colname =3D (some_date, some_filename)

colvalue =3D ""

 

I have many CF for indexing, as I index according to different (file) properties.

 

So, one alternative design for indexing CF could be:

rowkey =3D folder_id

colname =3D (indexed value, timestamp, file_id) <= /o:p>

colvalue =3D ""

 

Alternative design :

* pro: same rowkey for all indexing CF =3D> **all** indexing CF could be updated through one batch_mutate

* con: repeating "indexed value" (1er colnam= e part) again ang again (=3D a string up to 20c)

 

According to pro vs con, is the alternative design mor= e or less interesting ?

 

Thanks.

 

Dominique

 

 

--_000_9C88BF562A27AA41B242B2780441926E20B180A636THSONEA01CMS0_--